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Preface 


The enormous progress over the last decades in our understanding of the 
mechanisms behind the complex system “Earth” is to a large extent based 
on the availability of enlarged data sets and sophisticated methods for their 
analysis. Univariate as well as multivariate time series are a particular class 
of such data which are of special importance for studying the dynamical pro¬ 
cesses in complex systems. Time series analysis theory and applications in 
geo- and astrophysics have always been mutually stimulating, starting with 
classical (linear) problems like the proper estimation of power spectra, which 
has been put forward by Udny Yule (studying the features of sunspot activity) 
and, later, by John Tukey. 

In the second half of the 20th century, more and more evidence has been 
accumulated that most processes in nature are intrinsically non-linear and 
thus cannot be sufficiently studied by linear statistical methods. With mathe¬ 
matical developments in the fields of dynamic system’s theory, exemplified by 
Edward Lorenz’s pioneering work, and fractal theory, starting with the early 
fractal concepts inferred by Harold Edwin Hurst from the analysis of geophys¬ 
ical time series, nonlinear methods became available for time series analysis as 
well. Over the last decades, these methods have attracted an increasing inter¬ 
est in various branches of the earth sciences. The world’s leading associations 
of geoscientists, the American Geophysical Union (AGU) and the European 
Geosciences Union (EGU) have reacted to these trends with the formation of 
special nonlinear focus groups and topical sections, which are actively present 
at the corresponding annual assemblies. 

Surprisingly, although nonlinear methods have meanwhile become an es¬ 
tablished, but still developing toolbox for the analysis of geoscientific time 
series, so far there has not been a book giving an overview over corresponding 
applications of these methods. The aim of this volume is therefore to close this 
apparent gap between the numerous excellent books on (i) geostatistics and 
the “traditional” (linear) analysis of geoscientific time series, (ii) the nonlin¬ 
ear modelling of geophysical processes, and (iii) the theory of nonlinear time 
series analysis. 



VI 


Preface 


This volume contains a collection of papers that were presented in a topical 
session on “Applications of Nonlinear Time Series Analysis in the Geosciences” 
at the General Assembly of the European Geosciences Union in Vienna from 
April 15-20, 2007. More than 30 colleagues from various countries used this 
opportunity to present and discuss their most recent results on the analysis of 
time series from problems originated in the fields of climatology, atmospheric 
sciences, hydrology, seismology, geodesy, and solar-terrestrial physics. Oral 
and poster sessions included a total of 38 presentations, which attracted the 
interest of many colleagues working both theoretically on and practically with 
nonlinear methods of time series analysis in the geosciences. The feedback from 
both presenters and audience has encouraged us to prepare this volume, which 
is dedicated to both experts in nonlinear time series analysis and practitioners 
in the various geoscientific disciplines who are in need of novel and advanced 
analysis tools for their time series. In this volume, presentations shown at the 
conference are complemented by invited contributions written by some of the 
most distinguished colleagues in the field. 

In order to allow the interested reader to easily find methods that are 
suitable for his particular problems or questions, we have decided to arrange 
this book in three parts that comprise typical applications from the fields of 
climatology, geodynamics, and solar-terrestrial physics, respectively. However, 
especially in the latter case, the assignment of the different subjects has not 
always been unique, as there are obvious and rather strong links to the two 
other fields. Moreover, we would like to note that there are methods whose 
application has already become very common for studying problems from 
either of these fields. 

The first 7 chapters deal with problems from climatology and the atmo¬ 
spheric sciences. A. Gluhovsky discusses the potential of subsampling for the 
analysis of atmospheric time series, which usually cannot be described by 
a simple linear stochastic model. In such cases, traditional estimates of al¬ 
ready very simple statistics can be significantly biased, a problem that can be 
solved by using subsampling methods. J. Miksovsky, P. Pisoft, and A. Raidl 
report results on the spatial patterns of nonlinearity in simulations of global 
circulation models as well as reanalysis data. S. Hallerberg, J. Brocker, and 
H. Kantz discuss different methods for the prediction of extreme events, a chal¬ 
lenging problem of contemporary interest in various geoscientific disciplines. 
D.B. Percival presents an overview about the use of the discrete wavelet trans¬ 
form for the analysis of climatological time series, with a special consideration 
of ice thickness and oxygen isotope data. G.S. Duane and J.P. Hacker describe 
a framework for automatic parameter estimation in atmospheric models based 
on the theory of synchronisation. W.W. Hsieh and A.J. Cannon report on 
recent improvements on nonlinear generalisations of traditional multivariate 
methods like principal component analysis and canonical correlation analysis, 
which are based on the application of neural networks and allow the extraction 
of nonlinear, dynamically relevant components. R. Donner, T. Sakamoto, and 
N. Tanizuka discuss methods for quantifying the complexity of multivariate 
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time series, and how such concepts can be used to study variations and spatio- 
temporal dependences of climatological observables. As a particular example, 
the case of Japanese air temperature records is considered. 

The next 5 chapters describe the analysis of time series in the fields of 
oceanography and seismology. S.M. Barbosa, M.E. Silva, and M.J. Fernandes 
discuss the issue of characterising the long-term variability of sea-level records 
in the presence of nonstationarities, trends, or long-term memory. A. Ardalan 
and H. Hashemi describe a framework for the empirical modelling of global 
ocean tide and sea level variability using time series from satellite altimetry. 
J.A. Hawkins, A. Warn-Varnas, and I. Christov use different linear as well as 
nonlinear Fourier-type techniques for the analysis of internal gravity waves 
from oceanographic time series. M.E. Ramirez, M. Berrocoso, M.J. Gonzalez, 
and A. Fernandez describe a time-frequency analysis of GPS data from the 
Deception Island Volcano (Southern Shetland Islands) for the estimation of 
local crustal deformation. A. Jimenez, A.M. Posadas, and K.F. Tiampo use 
a cellular automaton approach to derive a simple statistical model for the 
spatio-temporal variability of seismic activity in different tectonically active 
regions. 

The final 4 chapters discuss problems related to dynamical processes on 
the Sun and their relationship to the complex system “Earth”. I.M. Moroz 
uses a topological method, the so-called template analysis, to study the in¬ 
ternal structure of chaos in the Hide-Skeldon-Acheson dynamo, and com¬ 
pares her results with those for the well-known Lorenz model. N.G. Mazur, 
V.A. Pilipenko, and K.-H. Glassmeier describe a framework for the analy¬ 
sis of solitary wave signals in geophysical time series, particularly satellite 
observations of electromagnetic disturbances in the near-Earth environment. 
M. Palus and D. Novotna introduce a nonlinear generalisation of singular 
spectrum analysis that can be used to derive dynamically meaningful oscilla¬ 
tory components from atmospheric, geomagnetic, and solar variability signals. 
Finally, R. Donner demonstrates the use of phase coherence analysis for un¬ 
derstanding the long-term dynamics of the north-south asymmetry of sunspot 
activity. 

We would like to express our sincerest thanks to those people who made 
the idea of this book becoming reality: the authors, who prepared their excel¬ 
lent results for publication in this book and the numerous referees, who helped 
us evaluating the scientific quality of all contributions and making them be¬ 
ing ready for publication. We also acknowledge the support of Springer at all 
stages during the preparation of this book. We do very much hope that it will 
inspire many readers in their own scientific research. 


Dresden / Porto, 
January 2008 


Reik Donner 
Susana Barbosa 
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Part I 


Applications in Climatology 
and Atmospheric Sciences 



Subsampling Methodology for the Analysis 
of Nonlinear Atmospheric Time Series 


Alexander Gluhovsky 

Department of Earth & Atmospheric Sciences, Department of Statistics, and 
Purdue Climate Change Research Center (PCCRC), Purdue University, West 
Lafayette, Indiana 47907, USA, agluOpurdue. edu 


Abstract. This contribution addresses the problem of obtaining reliable sta¬ 
tistical inference from meteorological and climatological records. The common 
practice is to choose a linear model for the time series, then compute confi¬ 
dence intervals (CIs) for its parameters based on the estimated model. It is 
demonstrated that such CIs may become misleading when the underlying data 
generating mechanism is nonlinear, while the computer intensive subsampling 
method provides an attractive alternative (including situations when linear 
models are entirely out of place, e.g., when constructing CIs for the skew¬ 
ness) . 


Keywords: Nonlinear time series, Confidence intervals, Subsampling, 
Skewness 


1 Introduction 

Conventional statistical methods are commonly based on strong assumptions 
that are rarely met in atmospheric data sets (e.g., [1]). These include the as¬ 
sumption that observations follow a normal (Gaussian) distribution, or the 
assumption of a linear model for the observed time series. In fact, distri¬ 
butions of many meteorological and climatological variables are not normal, 
as the velocity field in a turbulent flow [2, 3], the precipitation amount or 
economic damage from extreme weather events [4, 5]. At the same time, 
it has become clear that while departures from normality, nonlinearities in 
real data generating mechanisms (DGMs) may render conventional statisti¬ 
cal inference misleading, on the positive side, computer intensive resampling 
methods [6, 7, 8, 9] could bring about valid results without imposing question¬ 
able assumptions on DGMs (e.g., [10, 11]). This work describes how one such 
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method, subsampling [8], can be used in practice to obtain reliable inference 
from observed and modeled time series. 

As a motivating example, consider the time series of the vertical velocity 
of wind recorded from an aircraft (50 m above Lake Michigan, 70 m/s flight 
speed, 20 Hz sampling rate) during an outbreak of a polar air mass over 
the Great Lakes region [12]. Figure 1 shows a record of 4096 data points 
(corresponding to about 14.3 km) that has passed the test for stationarity [13]. 



Fig. 1: Record of 20-Hz aircraft vertical velocity measurements. 


The sample mean, variance, and skewness of the vertical velocity computed 
from the record are, respectively, 0.03, 1.11, and 0.84. Sample characteristics 
like these (routinely obtained in field programs as well as in laboratory experi¬ 
ments and computer simulations) are just point estimates (our “best guesses”) 
of the true values of the parameters, so confidence intervals (CIs) are duly 
employed to learn how much importance is reasonable to attach to such num¬ 
bers. However, since the DGM is usually unknown, the common practice is 
to assume a linear parametric model for it, then estimate the model from the 
observed record, and compute CIs for parameters of the underlying time series 
from the estimated model. 

In Sect. 2, it is demonstrated via Monte Carlo simulations of a model 
nonlinear time series that nonlinearities in the real DGM may render use¬ 
less the inference (90% CIs for the variance) based on the estimated linear 
parametric model. Then in Sect. 3, we show how the subsampling method [8] 
allows one to avoid time series analysis anchored in parametric models with 
imposed perceived physical assumptions. Previously [11, 14], we have consid¬ 
ered subsampling CIs for the mean and variance of explicit nonlinear time 
series as well as the related enhancements of the subsampling methodology. 
Completely new in this paper are the analysis of the coverage of conventional 
CIs in case of an implicit nonlinear time series (those retaining the general 
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ARMA form but excited by nonnormal white noise [15]) and the construction 
of CIs for the skewness from a single record of limited length. The latter is 
of particular importance since CIs for the skewness cannot be obtained from 
linear models, while nonzero skewness and limited record lengths are frequent 
attributes of atmospheric time series, especially of those relevant to extreme 
weather and climate events. 


2 Inadequacy of Linear Models 

2.1 The Model 

The data were generated using variations of the following nonlinear model [16], 

Y t =X t + a(X?~ 1), (1) 

where X t is a first order autoregressive process (AR(1)), 

X t = 4>X t -i + et, (2) 

0 < (j) < 1 is a constant, and e t is a white noise process (a sequence of 
uncorrelated random variables with mean 0 and variance a 2 ). 

AR(1) with a Gaussian white noise is widely employed in studies of climate 
as a default model for correlated time series (e.g., [17, 18]). When the white 
noise in model (2) is not Gaussian, the model may exhibit nonlinear behavior 
and is referred to as an implicit nonlinear model [15], as opposed to an explicit 
nonlinear model (1). 

If a record of length n is generated from model (2) with a Gaussian white 
noise (i.e., called a linear model here), then, say, a 90% Cl for the variance of 
X t is given by 

i.% ± 1.645*5, (j] ■ TtA) . (3) 

where o x is the sample variance, an estimate of the true variance of X t , 

4 = ° 2 J (1 - <P) ■ (4) 

Equation (3) follows from the fact that a\ is asymptotically normal with 
mean a\ and variance (2/n)cx|- (l + </> 2 ) / (l — 4> 2 ) [19]. Parameters (f> and a e 
are generally estimated from the data. Using our model example, it will be 
demonstrated in Sects. 2.2 and 2.3 that common practice of fitting a linear 
model to data that are generated (unknown to us) by an explicit or implicit 
nonlinear model, may result in invalid CIs, even though customary postfitting 
diagnostic checking indicates that the model provides an adequate description 
of the data. 

Also, a linear model may match the first two moments (mean and variance) 
of the observed time series, but such model has a zero skewness, while a 
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nonlinear model may be capable of matching all three moments. Note that at 
a = 0.14, the mean, variance, and skewness of Y t (nonlinear model (1)) are, 
respectively, 0, l + 2a 2 « 1.04, and 6a+8a 3 « 0.86, i.e., close to corresponding 
sample characteristics (0.04,1.11, and 0.84) of the vertical velocity time series 
discussed in Sect. 1. Thus, Y t might provide a better description for that series 
than linear models. At the same time, the subsampling method (see Sect. 3) 
may not require that any model, linear or nonlinear, be fitted to the time 
series under study to obtain reasonable CIs, though an approximating model 
is often desirable to make subsampling CIs more accurate. 

2.2 Monte Carlo Simulations and Actual Coverage of 
Conventional CIs for the Variance of an Explicit Nonlinear Model 

A 90% Cl is the range of numbers that traps an unknown parameter with 
probability 0.90 called the coverage probability. This implies that if instead of 
one time series record commonly available in practice, the records could be 
generated over and over, and from each record a 90% Cl was computed, then 
90% of the resulting CIs would contain the parameter. Such coverage proba¬ 
bility (often referred to as a nominal or target coverage probability, e.g., [7]) is 
attained only if all assumptions underlying the method for the Cl construction 
are met. This is the case of CIs (3) when the data are generated from linear 
model (2). In geosciences, however, such assumptions are rarely met, so that 
the actual coverage probability may differ from the target level (sometimes 
considerably as is demonstrated below). Intervals with confidence levels other 
than 90% (e.g., 95% or 99%) are also often used in various applications (the 
higher the confidence level the wider the Cl). 

Using the above probabilistic interpretation, the actual coverage probabil¬ 
ity could be determined through Monte Carlo simulations when the DGM is 
known. Although this assumption is unrealistic, still Monte Carlo simulations 
of models possessing statistical properties shared by real processes may pro¬ 
vide valuable information on what can be expected in situations of practical 
interest. With all this in mind, Monte Carlo simulations of the nonlinear model 
(1) were implemented as follows. First, 1000 records, each of 1024 observa¬ 
tions, were generated from the model with cf> = 0.67 and Gaussian white noise 
with zero mean and variance er 2 = 1 — (j) 2 = 0.5511 (which makes a 2 x = 1). 
The choice of 1024 observations was motivated by the practical implementa¬ 
tion of subsampling in Sect. 3.3 requiring a record length to be a power of 2 
(1024 = 2 10 ), and because at the chosen value of q i, about 1000 data points 
from model (1) allow the same accuracy in the estimation of variance as 400 
independent normal observations (see, e.g., [19]). Note also that while the 
variance and skewness of Y t grow with a (see the last paragraph in Sect. 2.1), 
the effective sample size does not increase due to nonlinearities: linearly fil¬ 
tering time series data to remove correlation, results in the white noise that 
nevertheless retains the nonlinear structure of the original time series [20]. 
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Next, pretending that, as in reality, the data generating mechanism was 
unknown, an AR(1) model was fitted to each realization, the goodness of fit 
confirmed by commonly employed diagnostic checking procedures, and the 
90% CIs were computed using (3). After that, from the resulting set of 1000 
CIs, the actual coverage probability was determined by counting the fraction 
of times the variance of Y t , V = 1 + 2a 2 , was covered by the CIs. Finally, the 
procedure was repeated for various values of a: 0.00, 0.05, 0.1, 0.15, 0.20, 0.25, 
0.30, 0.35. 

The results are shown in Fig. 2. Not surprisingly, at a = 0 (when the data 
generating model is linear) the actual coverage probability of Cl (3) is about 
the nominal value, 0.90. In contrast, with growing nonlinearity (characterized 
by increasing a) the actual coverage rapidly decreases from the target value, 
making such CIs misleading [11]. 



a 


Fig. 2: Actual coverage probabilities (CP) of 90% CIs (3) for the variance of time 
series (1) at various values of nonlinearity constant a. The horizontal line indicates 
the target 0.90 coverage. 


2.3 Actual Coverage of Conventional CIs for the Variance 
of Implicit Nonlinear Model 

Consider again model (1), this time assuming that the white noise is no longer 
Gaussian, but follows a Student’s t distribution (thus introducing nonlinearity 
implicitly [15]). Figure 3 shows PDFs of t distribution with three (short- 
dashed) and six (long-dashed) degrees of freedom, as well as that of standard 
normal distribution (solid). Similar to the standard normal curve, the t curves 
are symmetric about zero, and they become practically indistinguishable from 
the standard normal at larger degrees of freedom. But of importance for the 
problem discussed here is that the tails of the t curves lie above the tails of 
the normal. 
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As revealed by Monte Carlo simulations of model (1) with a = 0 and a 
Student’s t white noise (i.e., implicit nonlinear model (2)), such heavier tails 
result in poor performance of CIs (3). Namely, their actual coverage in case 
of the t white noise with six and three degrees of freedom is about 0.83 and 
0.43, respectively, as opposed to nearly 0.90 for model (2) with the Gaussian 
white noise (see Sect. 2.2) - also a warning against aptness of AR models for 
fitting heavy tailed data from extreme events. 



Fig. 3: Probability density functions for standard normal (solid) and Student’s t 
distributions with 3 (short-dashed) and 6 (long-dashed) degrees of freedom. 


In addition to heavy tails, another source of trouble for the validity of CIs 
(3) is a nonzero skewness, cf. [10]. The same experiment with model (2), now 
with the white noise following a skewed distribution (lognormal (0,1)), has 
resulted in the actual coverage of CIs (3) of just 0.35. 

Figure 4 shows that nonnormal noise exacerbates the effect of explicit 
nonlinearity on the actual coverage of CIs (3) when data are generated from 
the general model (1). The lower curve in Fig. 4 (corresponding to the t white 
noise with six degrees of freedom) differs markedly from the upper curve 
(corresponding to the Gaussian white noise) that was taken from Fig. 2. 


3 Subsampling Confidence Intervals 

As an alternative, consider subsampling [8], a computer-intensive method that 
works under even weaker assumptions than other bootstrap techniques (it only 
requires a nondegenerate limiting distribution for the properly normalized 
statistic of interest), thus delivering us from having to rely on questionable 
assumptions about data. Subsampling is based on the values of the statistic of 
interest recomputed over subsamples of the record of the time series, Y t , i.e., 
blocks of consecutive observations of the same length b (block size) sufficient 
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a 


Fig. 4: Same as in Fig. 2 with the lower solid curve added that corresponds to 
Student’s t white noise with 6 degrees of freedom in Eq. (2). 


to retain the dependence structure of the time series. Three blocks of size b 
(the first, the ith, and the last) are underscored in a record below containing 
n observations and, therefore, n — b + 1 blocks: 

{ft, n, Yj ,..., Y l+b _ ± , ..., Y n _ b+1 ..., Y n ) . (5) 

b b b 

When the DGM (the model) is known, one could (following the probabilis¬ 
tic interpretation of CIs in the beginning of Sect. 2.2) generate a very large 
number of realizations, compute the sample variance from each realization, 
estimate the 0.05 and 0.95 quantiles of its distribution and use them as the 
lower and upper confidence limits of a 90% (percentile) Cl. In practice, i.e., 
when the model is unknown and usually only one record of the observed time 
series is available, subsampling comes to the rescue by replacing computer 
generated realizations from the known model with subsamples of the single 
existing record. 

In Monte Carlo simulations to determine the actual coverage probabilities 
of subsampling CIs, there is no need to fit a model to the data. In other 
respects, the simulations below were carried out as described in Sect. 2.2: 
from each of 1000 realizations of Y t , a subsampling Cl for the variance or 
the skewness of Y t was computed, then the actual coverage probability was 
determined by counting the fraction of times the known value of the parameter 
was covered by the CIs. 

The results of Monte Carlo simulations demonstrate the superiority of sub¬ 
sampling CIs over conventional CIs in estimating the variance of a nonlinear 
time series (Sect. 3.1), and well as in estimating the skewness (Sect. 3.2), 
where linear models are inapplicable. The choice of block size b is treated in 
Sect. 3.3. 
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3.1 Actual Coverage of Subsampling CIs for the Variance 

Coverage probabilities of 90% subsampling CIs for the variance of Y t in model 
(1) are presented in Fig. 5 by a long-dashed curve. The plummeting solid 
curves are taken from Fig. 4; they show the coverage of conventional CIs (3) 
for the variance of Y t . 

The diminishing actual coverage of CIs (3) is due to the fact that they 
fail to grow noticeably with a. In contrast, subsampling CIs enlarge with 
increasing a, so that their coverage remains practically the same (and close 
to the target of 0.90) for all values of a. Using calibration [8], this allows to 
achieve even better coverage. That is, one might replace the nominal 90% 
CIs providing the actual coverage of 0.86 (at a = 0) with nominal 95% CIs 
providing the actual coverage of 0.90 at a = 0 and 0.87 at a = 0.35, as seen 
from the short-dashed line in Fig. 5. In practice, calibration can be carried 
out using a model time series that shares certain statistical properties with 
the one under study (e.g., model (1) with a = 0.14 for the vertical velocity 
time series). 



a 


Fig. 5: Actual coverage probabilities (CP) of 90% conventional (solid curves from 
Fig. 4) and subsampling ( long-dashed curve ) CIs for the variance of time series (1) at 
various values of nonlinearity constant a. The short-dashed curve shows the result 
of calibration, the solid horizontal line indicates the target 0.90 coverage. 


3.2 Actual Coverage of Subsampling CIs for the Skewness 

Nonzero skewness is a frequent attribute of atmospheric and climatic time 
series, but CIs for the skewness cannot be obtained from linear models, which 
imply zero skewness. Yet subsampling works here as well. 

In Fig. 6, the long-dashed curve shows the actual coverage of subsampling 
CIs for the skewness of time series generated from model (1). Not as good 
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as its counterpart in Fig. 5 for the variance, since estimating the skewness 
requires much longer records [13, 16] (in our simulations, records of length 
n = 1024 were used). 

When feasible, a simple way to improve the coverage is to increase the 
record length. The solid curve in Fig. 6 shows a better coverage, thanks to 
longer records with n = 4096. Otherwise, a calibration can be used: the short- 
dashed line demonstrates improved (due to calibration) coverage for the orig¬ 
inal records of n = 1024 observations (as in Sect. 3.1, nominal 90% CIs in the 
subsampling procedure were replaced by nominal 95% CIs). 



a 


Fig. 6: Actual coverage probabilities of 90% subsampling CIs for the skewness of 
time series (1) at various values of nonlinearity constant a: original ( long-dashed ) 
and calibrated (short-dashed). The solid curve shows the coverage for records four 
times longer (4096 observations, not calibrated). 


3.3 Block Size Selection 

Figure 7 demonstrates that the actual coverage of subsampling CIs depends 
considerably on the block size b. The two curves in Fig. 7, one (dashed) for 
the variance and the other (solid) for the skewness, were obtained by Monte 
Carlo simulations of model (1), similar to those in Sects. 3.1 and 3.2, but 
with fixed a = 0.14 and varying b. For each curve, there exists a range of 
block sizes (around its maximum) that would be appropriate for subsampling. 
Accordingly, subsampling CIs in Sects. 3.1 and 3.2 were computed with thus 
determined optimal block sizes for all a (at a = 0.14, as seen from 7, they 
were b = 80 for the variance and b = 140 for the skewness). 

In practice, where the model is unknown and typically only one record is 
available, the choice of the block size turns out to be the most difficult problem 
in subsampling, shared by all blocking methods. The asymptotic result [8], 

b —> oo and b/n —+ 0 as n —> oo, (6) 










12 


A. Gluhovsky 


that the block size needs to tend to infinity with the sample size but slower, 
does not help to choose the block size for relatively short atmospheric and 
climatic records. 



Fig. 7: Actual coverage probabilities of 90% subsampling CIs at various block sizes 
for the variance (dashed curve) and skewness (solid curve) from Monte Carlo simu¬ 
lations of model (1) at a = 0.14. 


We developed, however, another resampling technique that permits de¬ 
termining the optimal block size from one record [14]. Obviously, we had to 
modify the first step in previous simulations, where 1000 independent realiza¬ 
tions, each of n = 1024 data points, were generated. These are now replaced 
by pseudo realizations from the single available record via the following ver¬ 
sion of the circular bootstrap [21]. The record of n = 1024 data points is 
“wrapped” around a circle, then p = 2 k < n points (say, p = 32) on the 
circle are chosen at random (following a uniform distribution on the circle) as 
starting points for p consecutive segments of a pseudo realization. The length 
of each segment is n/p , so the pseudo realization is again of length n. In the 
current implementation of the technique it was convenient to choose both n 
and p to be powers of 2. The procedure is repeated to generate N such pseudo 
realizations, that substitute 1000 independent realizations of a model time se¬ 
ries. In [14] this technique was tested on subsampling CIs for the mean of X t 
(linear model (2)). 

The actual coverage of subsampling CIs for the skewness of Y t (nonlinear 
model (1)) obtained from 1000 independent realizations of Y t at a = 0.14 
is shown in Fig. 8 by the the solid curve (taken from Fig. 7). In practice, 
however, with only one realization available, such a curve (that permits to 
choose the appropriate block size) would not be available. What then can be 
obtained from its substitute resulting from pseudo realizations? 

Each dashed curve in Fig. 8 was computed using a different record of 
length n = 1024 generated from model (1). As described above, N = 10000 
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(this results in smooth curves) pseudo realizations were obtained from the 
record, then the actual coverage was found by counting the fraction of times 
the skewness of Y t , S = 6a+8a 3 , was covered by the 10000 CIs. The maxima of 
dashed curves vary wildly (depending on the initial record used), so that each 
dashed curve typically fails to provide the correct coverage. Nevertheless, the 
dashed curves essentially retain the shape of the solid curve, thus indicating 
a suitable block size to be used in subsampling. 



Fig. 8: Selection of block size for subsampling in case of one available record. Each 
dashed curve results from pseudo realizations generated from one different record of 
Yt (nonlinear model (1)). Solid curve (from Fig. 7) shows actual coverage probabil¬ 
ities of 90% subsampling CIs for the skewness computed from independent realiza¬ 
tions of Y t . 


3.4 Vertical Velocity Skewness 

Return now to a real life example: the vertical velocity of the wind shown in 
Fig. 1. From this single record, the curve for the skewness similar to dashed 
curves in Fig. 8 was obtained (see Fig. 9) using the technique described in 
Sect. 3.3. 

This indicated b = 100 as a suitable block size. Then subsampling with 
b = 100 has resulted in the following 90% subsampling Cl for the skewness of 
the vertical velocity time series, 


(0.66,1.02), (7) 

which reasonably confirms its positive skewness. Calibration could slightly 
modify Cl (7), while making its coverage closer to the target. 

How should one proceed with a calibration? High coverage exhibited by 
the curve in Fig. 9 at b = 100 is too good to be true: another record similar 
to that in Fig. 1 may result in a considerably less impressive curve, as seen 
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from Fig. 8. Since the actual coverage of the subsampling Cl, on which the 
calibration in Sect. 3.2 was based, remains unknown in real situations, an 
approximating nonlinear model may be used. 



Fig. 9: An analog of dashed curves in Fig. 8 computed from the record in Fig. 1. 


4 Conclusion 

This study has addressed the problem of obtaining reliable statistical inference 
from atmospheric time series. Since these originate from an inherently nonlin¬ 
ear system, statistical inference based on linear models may be questionable. 
To investigate how nonlinearities may affect CIs commonly computed from 
estimated linear models, an AR(1) process driven by a Gaussian white noise 
(typically used as a default model for correlated time series in climate stud¬ 
ies) was altered with (i) a nonlinear component, (ii) a Student’s t white noise 
replacing a Gaussian one. 

It was demonstrated by Monte Carlo simulations that the actual coverage 
probabilities of such common CIs for the variance of our model nonlinear 
time series become inadequate. In contrast, CIs for the variance obtained via 
subsampling method proved valid for both linear and nonlinear versions of 
the model. 

Many atmospheric time series are nonstationary, and the subsampling 
method is by no means restricted to stationary series [8]. However, to em¬ 
phasize the issues central to this work, only stationary time series are treated 
here. Besides, atmospheric time series are often considered trend stationary 
[22], i.e., modeled as the sum, Y t = + X t , where fit is a deterministic trend 

and X t is a stationary process, commonly a linear one. For example, fitting a 
linear trend fi t to a temperature time series, Bloomfield [23] selected an AR(4) 
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model for X t . In another study, Y t was a polynomial trend plus a Gaussian 
fractionally differenced noise [24]. 

For the same reason, problems with the linear models approach to con¬ 
structing CIs for parameters of potentially nonlinear time series are illustrated 
using only the simplest linear model, AR(1). Although a higher order ARMA 
model may provide a better fit to an observed time series, no linear model is 
capable to produce CIs for the skewness (the focus of this work). In contrast, 
employing subsampling has resulted in reasonable CIs for the skewness of the 
model nonlinear time series. Subsampling may also be helpful in statistical 
analyses of extreme events (though not without difficulties preventing easy 
applications [25]), since for extreme value distributions with nonzero skew¬ 
ness and heavy tails, common CIs based on asymptotic maximum likelihood 
fail to capture the real variability [26]. 
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Abstract. We employed selected methods of time series analysis to investigate the 
spatial and seasonal variations of nonlinearity in the NCEP/NCAR reanalysis data 
and in the outputs of the global climate model HadCM3 of the Hadley Center. The 
applied nonlinearity detection techniques were based on a direct comparison of the 
results of prediction by multiple linear regression and by the method of local linear 
models, complemented by tests using surrogate data. Series of daily values of relative 
topography and geopotential height were analyzed. Although some differences of the 
detected patterns of nonlinearity were found, their basic features seem to be iden¬ 
tical for both the reanalysis and the model outputs. Most prominently, the distinct 
contrast between weak nonlinearity in the equatorial area and stronger nonlinearity 
in higher latitudes was well reproduced by the HadCM3 model. Nonlinearity tends 
to be slightly stronger in the model outputs than in the reanalysis data. Nonlinear 
behavior was generally stronger in the colder part of the year in the mid-latitudes 
of both hemispheres, for both analyzed datasets. 


Keywords: Nonlinearity, Reanalysis, Global climate model, Surrogates 


1 Introduction 

The Earth’s climate system, as well as its atmospheric component, is an in¬ 
trinsically nonlinear physical system. This nonlinearity is generally reflected 
in many series of climatic variables such as atmospheric pressure or temper¬ 
ature, but whether it is detectable and how strong it is depends on the type 
of the variable [1, 2, 3], geographic area of its origin [2, 4, 5, 6] or length of 
the signal [3]. The manifestations of nonlinearity in time series can be studied 
in numerous ways, using different statistics or criteria of the presence of non¬ 
linear behavior. The techniques applied so far to meteorological data involve 
the calculation of the mutual information or persistence [1, 7, 8], statistics 
based on the performance of a nonlinear predictive method [3, 4, 9], non¬ 
linear correlations [10] or the examination of the character of the prediction 
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residuals [2, 4, 5]. Tests using some form of surrogate data are frequently 
employed [1, 2, 3, 4, 7, 9, 10]. The presence of nonlinearity can also be as¬ 
sessed by comparing the performance of a linear and a nonlinear time series 
analysis method. In the atmospheric sciences, such studies are frequently as¬ 
sociated with the application of statistical methods for prediction [6, 11, 12], 
or downscaling and postprocessing tasks [6, 13, 14, 15, 16]. Alongside with a 
wide spectrum of techniques for the detection of nonlinearity, different authors 
studied diverse types of signals, ranging from various variables related to the 
local temperature [1, 3, 6, 7, 10, 13, 14, 15, 16] or pressure [1, 2, 3, 4, 5, 7] to 
characteristics of larger-scale dynamics such as the mean hemispheric avail¬ 
able potential energy [8]. Heterogeneity of the methods and datasets applied 
by different researchers makes it difficult to directly compare the results and 
use them to create a consistent global picture of the geographic variations of 
nonlinearity. However, it also seems that there are some systematic regulari¬ 
ties in the spatial distribution of nonlinearity or of the related characteristics 
[2, 5, 6, 10, 17]. Here, we investigate this matter further, using a comparison of 
the results of linear and nonlinear prediction and tests based on the surrogate 
data. 

A significant portion of the existing studies dealing with the issue of nonlin¬ 
earity in time series focus on the analysis of individual scalar signals, typically 
employing time delayed values for the construction of the space of predictors 
or phase space reconstruction. Due to the complex behavior the atmosphere 
exhibits, and the relatively small size of the available records, the informa¬ 
tion content in a single series is limited and often insufficient for an effective 
application of nonlinear techniques. But meteorological measurements are fre¬ 
quently available for more than one variable, and they are carried out at mul¬ 
tiple locations. When a multivariate system is used instead of a single scalar 
series, more information about the local state of the climate system can be 
obtained. It also seems that multivariate systems exhibit a generally stronger 
detectable nonlinear behavior [3]. For these reasons, and because using mul¬ 
tiple input variables is common in many tasks of statistical meteorology and 
climatology, we focused on settings with multivariate predictors in this study. 
We restricted our attention to just a few of the available variables, defining 
the temperature and pressure structure of the atmosphere. The two illustra¬ 
tive cases presented here are based on forecasts of daily values of the relative 
topography 850-500 hPa (which is closely related to the temperature of the 
lower troposphere) and of the geopotential height of the 850 hPa level (one of 
the variables characterizing the structure of the field of atmospheric pressure). 
Along with investigating the character of the series derived from actual mea¬ 
surements (NCEP/NCAR reanalysis), attention was paid to the potential of 
the global climate model HadCM3 to reproduce the structures detected in the 
observed data. This should help to assess whether such simulation is able to 
capture not just the basic characteristics of the Earth’s climate, but also the 
eventual nonlinear features of the respective time series. The utilized datasets 
are presented in Sect. 2, the techniques applied to quantitatively evaluate 
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nonlinearity are described in Sect. 3. Section 4 is devoted to the study of the 
spatial variations of nonlinearity. Section 5 focusses on the influence of the 
presence of the annual cycle in the series and seasonal changes of the detected 
patterns. Finally, in Sect. 6, the results are discussed with regard to their pos¬ 
sible physical cause and practical implications. Color versions of the presented 
maps of the geographical distribution of nonlinearity (Figs. 3, 5, 6, 8 and 9) 
can be accessed at http://www.miksovsky.info/springer2008.htm. 


2 Data 

Direct atmospheric observations and measurements suffer from a number of 
potential problems. Their locations are typically unevenly spaced and cov¬ 
erage of some areas of the Earth is limited. Data from different sources are 
often incompatible and sometimes flawed. This restricts the usability of raw 
measurements for an analysis such as ours, the goal of which is to derive glob¬ 
ally comparable results. To avoid or reduce the aforementioned problems, we 
used a gridded dataset in this study instead of direct measurements - the 
NCEP/NCAR reanalysis [18, 19] (hereinafter NCEP/NCAR). The reanalysis 
is a dataset derived from measurements at weather stations, as well as in¬ 
puts from rawinsondes, meteorological satellites and other sources. The input 
observations are processed by a fixed data assimilation system, including a 
numerical forecast model, and the resulting series are available in a regular 
horizontal grid of 2.5° by 2.5°. Here, daily values of the geopotential height of 
the 850 hPa level (hereinafter H850) and 500 hPa level have been employed in 
a reduced 5° by 5° horizontal resolution, for the period between 1961 and 2000. 
From the values of the geopotential heights, the relative topography 850-500 
hPa (RT850-500) has been computed. This quantity describes the thickness of 
the layer between the 850 hPa and 500 hPa levels and it is proportional to its 
mean virtual temperature. According to the classification used by Kalnay et 
al. [18], geopotential heights fall into the A-category of variables, thus reflect¬ 
ing the character of actual measurements rather than the specific properties of 
the model applied to create the reanalysis. A typical example of the analyzed 
series of RT850-500 in the equatorial area and in the mid-latitudes is shown 
in Fig. 1. 

The recently increased interest in climate change instigated an intensive 
development of the models of the global climate. These simulations, to be 
reasonably realistic, must describe all key components of the climate system as 
well as the connections among them. As a result, the models are very complex 
and demanding with respect to the required computational resources. But 
despite their sophistication, no model is able to mimic the observed climate 
with absolute accuracy. A very important task in climate modeling is therefore 
validating the models, i.e., assessing their ability to reproduce the real climate. 
The common validation procedures are usually based on the basic statistical 
characteristics of the model outputs; here, we focus on the ability of a climate 
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Fig. 1: A section of the analyzed data: Time series of daily values of the relative 
topography 850-500 hPa in the equatorial area (0°E, 0°N) and in the mid-latitudes 
(0°E, 50°N), for the years 1991 and 1992. 


simulation to produce time series with the same nonlinear qualities as the 
real climate. For this task, we chose one of the major global climate models, 
HadCM3 of the Hadley Centre [20, 21]. The model outputs were used in a 
reduced horizontal resolution of 3.75° (longitude) by 5° (latitude). The model 
integration employed here was based on the observed concentrations of the 
greenhouse gasses and estimates of past changes in ozone concentration and 
sulfur emissions prior to the year 1990, and the emission scenario SRES B2 
afterwards [21]. Since we only used the period from 1961 to 2000 for our 
analysis, and there is just very little difference among the SRES scenarios in 
the 1990s, the specific scenario choice should not be crucial. 



Fig. 2: An example of the structure of the pattern of predictors, displayed for the 
predictand series located at 0°E, 50°N. Black circles mark the positions of the pre¬ 
dictors, the grid illustrates the reduced horizontal resolution of the NCEP/NCAR 
and HadCM3 data, used in this study. 
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3 Methods 


3.1 General Settings 


One of the key issues of the multivariate approach to the construction of the 
space of predictors is the selection of a suitable set of input variables. Unlike 
for some simple low-dimensional dynamical systems, a perfect phase-space 
reconstruction is impossible from climatic time series, due to the complexity 
of the underlying system. In the case of practical time series analysis tasks, 
a finite-dimensional local approximation of the phase space may suffice. To 
predict values of a scalar series in some grid point, we used a pre-set pattern 
of predictors, centered on the location of the predictand and spanning 30° in 
longitude and 20° in latitude (Fig. 2). A different configuration of predictors 
was chosen for each of the two tasks presented: In the case of the RT850-500 
forecast, the dimension of the predictor space was N = 18, with 9 values 
of RT850-500 and 9 values of H850 in a configuration shown in Fig. 2. For 
the forecast of H850, 9 predictors were used, all of which were of the H850 
type. Note that, despite the different spatial resolution of the NCEP/NCAR 
reanalysis and the HadCM3 model, the selected pattern of predictors could be 
applied for both of them directly, without interpolating the data to a common 
grid. 

All predictors Xi(t),i = 1,...,7V, were transformed to have zero mean 
and standard deviation equal to y/cdsp, using the linear transformation 
Xi(t ) —> y/cos(tp)(xi(t) —xi)/(Ji {ip being latitude of the respective grid point, 
Xi mean value of the predictor series and (T % its standard deviation). Hence, 
the predictor’s variance was proportional to the size of the area character¬ 
ized by the corresponding grid point. The presented results were derived from 
the outcomes of prediction one day ahead, carried out for grid points located 
between 70°N and 70°S (the areas closest to the poles were excluded from 
the analysis, due to the severe deformation of the applied spatial pattern of 
predictors in high latitudes). 

3.2 Direct Comparison-Based Approach 

Our primary technique of quantification of nonlinearity was based on a direct 
comparison of the root mean square errors (RMSEs) of prediction by a linear 
reference method, multiple linear regression, and by its nonlinear counterpart, 
the method of local linear models. In the case of linear regression, the value of 
the scalar predictand y at time t +1 was computed as a linear combination of 
the values of individual predictors Xi, i = 1,..., N, in the previous time step 


N 



(1) 


where the coefficients aj, j = 0,..., N, were calculated to minimize the sum 
of the squared values of the residuals y(t) — y(t). 
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Even a nonlinear system can be described rather well when the linear 
model is applied locally for smaller portions of the phase space instead of a 
global linear approximation. This concept has been successfully utilized for the 
construction of forecast models for many different types of time series. Several 
related studies are reprinted in [22] and the basic principles of the method of 
local models are also described, e.g., in [23]. The dynamics in the individual re¬ 
gions of the input space is approximated by linear mappings based on (1), but 
an individual linear predictive model (or a set of coefficients as*, respectively) 
is constructed for each value of t. To create such a local model, only a certain 
number M of the predictors-predictand pairs, representing the states of the 
system most similar to the one at time t, is employed to compute the coeffi¬ 
cients. The similarity of individual states was quantified by the distance of the 
respective iV-dimensional vectors of predictors x(f) = (xi(t),X2(t),... ,X]\ r{t)) 
here, using the Euclidian norm. 

To calculate the out-of-sample root mean square error of the prediction, 
the analyzed series were divided into two subintervals. The years 1961-1990 
were used as a calibration set, i.e., for the computation of the coefficients 
of the above described models. These were then tested for the years 1991— 
2000. The values of RMSE we obtained for the prediction by multiple linear 
regression {RMSE mlr) and local linear models {RMSE lm) were compared 
by computing 


SS LM = 1 - {RMSE lm /RMSE mlr) 2 , (2) 

which will be referred to as the local models’ skill score. Its definition is 
based on the commonly used concept of a skill score, described, e.g., in [24]. 
SS lm vanishes when both methods perform equally well in terms of RMSE 
and it equals to one for a perfect forecast by local models (presuming that 
RMSE mlr yf 0). The number M of predictors-predictand pairs used for 
the computation of the coefficients of the local models is one of the adjustable 
parameters of the method of local models. Depending on the specific structure 
of the local climate system, different values of M may be suitable to minimize 
RMSE. Here, local models constructed with M = 250, 500 and 1000 were 
tested for each grid point; the variant giving the lowest RMSE was then used 
in the subsequent analysis. 

3.3 Surrogate Data-Based Approach 

The above described approach yields results which are interesting from a 
practical perspective, but, strictly speaking, it only refers to a relation of 
two particular techniques, both of which may have their specifics. Another 
method, which does not rely on comparing different mappings, exists. It uses 
modified series (so-called surrogate series or surrogates), which preserve se¬ 
lected properties of the original signal, but are consistent with some general 
null hypothesis. Here, the hypothesis is that the data originates from a linear 
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Gaussian process, the output of which may have been modified by a static 
monotonic nonlinear filter. The values of a nonlinearity-sensitive statistic are 
then compared for the original series and multiple surrogates, and if a sta¬ 
tistically significant difference is detected, the null hypothesis is rejected. It 
should be noted that the formal rejection does not necessarily prove the pres¬ 
ence of nonlinearity in the signal, as it can be caused by other reasons, such as 
nonstationarity of the series or imperfection of the surrogate-generating proce¬ 
dure. For details see, e.g., [9], where the principles of the surrogate data-based 
tests are presented in depth, or [25], where the usability of several methods 
of generating surrogates is discussed for various geophysical data. 

For each grid point, 10 surrogates were created from the respective multi¬ 
variate system of time series. Prediction by the method of local linear mod¬ 
els was carried out for each of the surrogates and an arithmetic average 
RMSE surr of the resulting RMSEs was computed. A skill score-based vari¬ 
able, analogous to (2), was then calculated using RMSE for the original series 
RMS and RMSE surr : 

SSsurr = 1 - (RMSElm/RMSEsvrr) 2 ■ (3) 

In order to keep the computational demands at a reasonable level, the sur¬ 
rogate data-based analysis was performed just for M = 250. Also, the years 
1991-2000 were used for both calibration and testing of the mappings. The 
surrogate series were generated by the iterative amplitude adjusted Fourier 
transform [26] in its multivariate form [9]; the program package TISEAN by 
Hegger et al. [27] was applied for this task. 


4 Spatial Patterns of Nonlinearity 

Figure 3a shows the geographical distribution of the local models’ skill score 
S'S'lm, obtained for the NCEP/NCAR RT850-500 forecast. The most promi¬ 
nent feature of the detected pattern is the strong latitudinal variance of nonlin¬ 
earity. Near the equator, just very small and mostly statistically insignificant 
difference between the performance of purely linear regression and local linear 
models was found. Nonlinear behavior becomes visibly stronger in the mid¬ 
latitudes, and it is more pronounced on average in the northern hemisphere, 
where major nonlinearity was detected for all grid points north of circa 25°N 
(Fig. 4). In the southern hemisphere, the strongest nonlinearity was located 
in a band approximately between 25° S and 50° S. This structure seems to be 
well reproduced by the HadCM3 model (Fig. 3b), although the nonlinear be¬ 
havior is slightly stronger in the model data in the northern hemisphere - see 
Table 1, columns 1 and 2. The spatial correlation of the S'S'lm fields for the 
NCEP/NCAR and HadCM3 data was evaluated by computing the Pearson 
correlation coefficient, after linear interpolation of the HadCM3 data-based 
values of SSlm to the 5° by 5° grid of NCEP/NCAR. For the entire area 
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Fig. 3: Geographical distribution of the local models’ skill score SS lm, obtained 
for the RT850-500 prediction, using the NCEP/NCAR (a) and HadCM3 (b) data. 
Diamonds mark the positions of the grid points where daily errors of prediction by 
the method of local models were not statistically significantly lower than for linear 
regression at the 95% confidence level, according to the one-sided paired sign test. 


between 70°N and 70°S, the correlation was 0.91. When just extratropical 
areas were taken into account, the resemblance of the SS lm patterns was 
stronger in the northern hemisphere than in the southern one (Table 1, col¬ 
umn 3). Similar values of correlation were also obtained when the Spearman 
rank-order correlation coefficient was used instead of the Pearson one. 

Aside from the dominant latitudinal dependence, the detected nonlinearity 
patterns also exhibited a distinct finer structure. As can be seen in Fig. 3a for 
the NCEP/NCAR reanalysis data, local maxima of nonlinearity were found 
over Europe, North America, East Asia and the northern part of the Pacific 
Ocean, and east of the landmasses of the southern hemisphere. The HadCM3 
data yielded a very similar pattern (Fig. 3b). After the average latitudinal 
structure was filtered out by subtracting the respective latitudinal averages 
from the values of /S'S'lm in every grid point, the spatial correlation of the 
NCEP/NCAR and HadCM3 SS lm patterns was still rather high, though the 
resemblance of both fields was clearly stronger in the northern hemisphere 
(Table 1, column 4). 
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Fig. 4: Distribution of SS lm in different latitudes, obtained for the RT850-500 
prediction (latitude values are positive north of the equator). 


Table 1: Regional averages of SSlm, obtained for the RT850-500 prediction in the 
case of the NCEP/NCAR (column 1) and HadCM3 (column 2) data and spatial cor¬ 
relations of the NCEP/NCAR and HadCM3 SS lm patterns for the original values 
of SSlm (column 3) and after the average latitudinal dependence has been filtered 
out (column 4). 


Region 

SS LM 


Correlation 

NCEP/NCAR 

HadCM3 

Original 

Filtered 

25°N-70°N 

0.29 

0.34 

0.75 

0.60 

20°S-20°N 

0.04 

0.06 

0.89 

0.45 

70°S-25°S 

0.24 

0.23 

0.55 

0.48 


When the results of the H850 prediction were applied as a basis for a non¬ 
linearity detection, a somewhat different pattern emerged (Fig. 5). The basic 
latitudinal structure with very weak nonlinearity in the equatorial area was 
still present, but other details of the detected structure differed from the ones 
found for the RT850-500 prediction. In the northern hemisphere, maximum 
values of S'S'lm were located over the northwestern part of the Atlantic Ocean 
and the adjacent part of North America, as well as over the northern part of 
the Pacific Ocean. Both these maxima were rather well expressed, while the 
rest of the northern hemisphere exhibited weaker nonlinearity. In the southern 
hemisphere, the maxima of SSlm were less localized. The overall degree of 
nonlinearity was lower than for the RT850-500 prediction (Table 2, columns 
1 and 2). The similarity of the patterns obtained from the NCEP/NCAR and 
HadCM3 data was again very strong, with a value of global spatial correla¬ 
tion of 0.9. The nonlinearity was stronger on average in the HadCM3 outputs 
than in the NCEP/NCAR reanalysis. As for the match of the patterns of 
SS lm with filtered-out latitudinal dependence, there was still a high positive 
correlation, stronger in the northern hemisphere (Table 2, column 4). 
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Fig. 5: Same as Fig. 3, for the prediction of H850 instead of RT850-500. 


Table 2: Same as Table 1, for the prediction of H850 instead of RT850-500. 


Region 

SS LM 


Correlation 

NCEP/NCAR 

HadCM3 

Original 

Filtered 

25°N-70°N 

0.10 

0.13 

0.84 

0.85 

20°S-20°N 

0.02 

0.03 

0.71 

0.52 

70°S-25°S 

0.09 

0.14 

0.75 

0.71 


As can be seen from Fig. 6, the pattern of nonlinearity obtained for the 
RT850-500 prediction by means of surrogate data and expressed through 
S'S'surr is very similar to the one presented above for the direct comparison 
technique (Fig. 3a). To illustrate the distribution of RMSE in the ensemble 
of surrogates, a more detailed example of the outcomes is shown in Fig. 7 for 
the grid points along the 0° meridian. The results for the HadCM3 data are 
not shown, but they also confirm the outcomes of the direct comparison of 
multiple linear regression and local linear models. Similarly, surrogate data- 
based verification of the results derived from the H850 prediction showed no 
major differences either. 

It should be mentioned that when an identical setting is used for direct 
comparison-based and surrogate data-based tests, including an equal size of 
the calibration set, SS lm is systematically smaller than S'S'surr- The reason 
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Fig. 6: Geographical distribution of SSsvrr, obtained for the RT850-500 prediction, 
using the NCEP/NCAR data. Diamonds mark positions of the grid points, where the 
value of RMSE for the original series was not smaller than for all 10 surrogates. This 
is equivalent to the non-rejection of the hypothesis of a linear Gaussian generating 
process at the confidence level of about 91%, according to the usually applied one¬ 
sided rank-order test, described, e.g., in [9], Testing at a higher confidence level 
would require more surrogates, but even then, the results would be almost identical, 
as additional tests have shown for selected individual grid points. 




Fig. 7: Left panel : RMSE of the RT850-500 prediction by the local linear models 
method and its range for the respective surrogate series, for grid points at 0°E 
(NCEP/NCAR data). Right panels'. Values of RMSE (m) obtained for the original 
series and individual surrogates in the three selected grid points. 


for this difference is related to the behavior of the method of local models 
for purely linear series. When the processed signal contains no deterministic 
nonlinear component (like surrogates do) and M is smaller than the size of the 
calibration set, the method of local models performs slightly worse than linear 
regression. Our choice of a shorter calibration set for the surrogate data-basecl 
tests (Sect. 3.3) has actually partly compensated for this shift, because the 
magnitude of detected nonlinearity generally decreases with the reduction of 
the size of the calibration set. 
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Fig. 8: Same as Fig. 3a, for series with removed annual cycle. 


0.5 

0.4 

0.3 

0.2 

0.1 

0 

- 0.1 



0.5 

0.4 

0.3 

0.2 

0.1 



Fig. 9: Same as Fig. 3a, for the DJF (a) and JJA (b) seasons (winter and summer 
in the northern hemisphere) instead of the entire year. 
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5 Seasonal Variations of Nonlinearity 

The annual cycle is among the strongest oscillations in the climate system. 
It dominates series of many climatic variables, especially in higher latitudes 
(see example in Fig. 1). This also means that the geographical areas with 
a well-defined annual cycle coincide to some degree with the regions, where 
strong nonlinearity was detected. To assess the possible relationship, we re¬ 
peated some of the tests for the series with removed annual cycle. The removal 
was carried out by subtracting the mean climatological annual cycle of the 
respective variable, computed for the years 1961-2000 and smoothed by an 
11-day moving average. An example of the results is shown in Fig. 8, for the 
RT850-500 prediction. As a comparison to Fig. 3a reveals, the values of SS lm 
generally decreased after the annual cycle removal. Although this change was 
relatively small on average (e.g., the average value of 55 lm decreased from 
0.29 to 0.23 in the area north of 25°N, and from 0.24 to 0.21 south of 25°S), 
it was profound in the regions with the highest amplitude of the annual cycle 
of RT850-500. For instance, the maximum of 55 lm, originally detected over 
East Asia and the adjacent part of the Pacific Ocean, disappeared almost 
completely. In the southern hemisphere, the changes associated with the an¬ 
nual cycle removal were generally smaller. In the case of the H850 prediction, 
the shape of the pattern of SS lm remained practically identical for the an¬ 
nual cycle-free series, though the average degree of nonlinearity also slightly 
decreased. 

In many situations, the annual cycle cannot be treated as simply an oscil¬ 
lation superposed to the variations at other time scales. Different seasons are 
associated with different atmospheric dynamics in many regions, and proper¬ 
ties of the analyzed time series, including their eventual nonlinearity, may thus 
periodically vary throughout the year. Because of this, the analysis of climatic 
data is often performed separately for different parts of the year, typically sea¬ 
sons or months. We used this approach to investigate the seasonal variations 
of 55 lm- The results below are shown for the parts of the year corresponding 
to climatological winter (December, January and February - DJF) and sum¬ 
mer (June, July and August - JJA) of the northern hemisphere. When the 
analysis was carried out for separate seasons, the RMSE of the prediction by 
linear regression decreased for most grid points in the annual average. The 
performance of the method of local models usually became worse, primarily 
due to the reduction of the amount of data available for the calibration of the 
mappings. As a result, the average magnitude of nonlinearity decreased some¬ 
what, compared to the situation when the series were analyzed as the whole. 
Despite this change, the basic features of the patterns of 55 lm were still the 
same, as can be seen from an example of the results based on the RT850-500 
forecast (Fig. 9). In the equatorial area, the nonlinearity remained very weak 
or undetectable in all seasons. In higher latitudes, the patterns retained some 
of the basic shape, detected for the year as the whole, but their magnitude 
visibly varied with the season. The overall nonlinearity was stronger in the 
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DJF season than in JJA in the northern hemisphere, while in higher latitudes 
of the southern hemisphere, this variation was reversed and JJA exhibited 
stronger nonlinearity than DJF on average (Fig. 10, Table 3, columns 1 and 
2). The seasonal changes were stronger expressed in the northern hemisphere. 
The seasonal variation was well simulated by the HadCM3 model (Table 3, 
columns 3 and 4) and it was also detectable in the results based on the forecast 
of H850, for both the NCEP/NCAR and HadCM3 data (not shown). 


6 Discussion 

All performed analyses revealed a common basic latitudinal structure with 
just negligible nonlinearity in the equatorial regions, but generally stronger 
nonlinear behavior in the mid-latitudes of both hemispheres. A detailed anal¬ 
ysis of the factors behind the observed patterns might be problematic, because 
they do not seem to be a result of a single driving force, but rather their com¬ 
plex combination. There are, however, some possible links worth mentioning. 
In the case of the results based on the RT850-500 prediction, there may be 
a connection between more pronounced nonlinearity in the mid-latitudes and 
the activity of the polar front. The strongest nonlinear behavior over Europe 
and North America seems to coincide with the position of the zones where air 
masses of different origin often interact. In the southern hemisphere, where 
the landmasses are less extensive, areas of the strongest nonlinearity are typi¬ 
cally located rather east of the continents, possibly because of the interaction 
of the landmass with the prevailing westerlies. Between approximately 50°S 
and 60°S, where the amount of land is very small, nonlinearity is weaker 
on average. A removal of the annual cycle from the series slightly decreases 
the magnitude of detected nonlinearity, but except for the regions where the 
annual variation is very strong (East Asia), the effect of the annual cycle pres¬ 
ence does not dominate the results. For the H850 forecast, there appears to 
be a certain connection of the areas with strong nonlinearity to the zones of 
high horizontal gradient of H850. In the northern hemisphere, such areas are 
typically associated with deep stationary cyclones, which are usually present 
over the North Atlantic and North Pacific during winter. The match is not 
perfect though, and there may be some other factors involved. Altogether, it 
seems that nonlinearity tends to be stronger in the regions with more com¬ 
plex dynamics, where strong driving or perturbing factors are in effect. This 
hypothesis is supported by the fact that nonlinearity is generally more pro¬ 
nounced during the colder season in the mid-latitudes of both hemispheres, 
i.e., in situations when the temperature gradient between the equatorial area 
and the polar region is strongest. The fact that the seasonal variations are 
more distinct in the northern hemisphere is probably an effect of the uneven 
distribution of the continents, resulting in a larger influence of the continental 
climate in the northern mid-latitudes. 
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Fig. 10: Distribution of SS lm, obtained for the prediction of RT850-500 for the DJF 
and JJA seasons in different latitudes (NCEP/NCAR data). 


Table 3: Seasonal variations of nonlinearity (expressed by SS lm) in the 
NCEP/NCAR and HadCM3 data, for the RT850-500 prediction. 


Area 

NCEP/NCAR 

HadCM3 

DJF 

JJA 

DJF 

JJA 

25°N-70°N 

0.23 

0.16 

0.27 

0.20 

20°S-20°N 

0.01 

0.01 

0.02 

0.02 

70°S-25°S 

0.16 

0.20 

0.16 

0.18 


The two presented cases, based on the prediction of geopotential height 
and relative topography one day ahead, represent just a fraction of possible 
settings. From additional tests, carried out for different predictand-predictors 
combinations, it seems that the basic structure with weak nonlinearity in the 
equatorial area is typical for most situations. On the other hand, the finer de¬ 
tails of the detected patterns vary, especially with the type of the predictand. 
The exact number and geographical configuration of predictors seem to be 
less important, as long as they sufficiently characterize the local state of the 
atmosphere. Beside the type of the studied variables, we also paid attention 
to the sensitivity of the results to the specific details of the tests. It appears 
that the results are rather robust to the changes of the size of the source 
area of predictors, although a use of a too big or too small area leads to a 
general increase of the prediction error and a weakening of detected nonlinear¬ 
ity. The outcomes remain very similar when the input data are pre-processed 
by principal component analysis, instead of using the point-wise predictors 
directly. The method of eventual normalization of the predictors also does 
not appear to be of major importance. The observed patterns of nonlinearity 
seem to be rather stable in time, i.e., the specific choice of the analyzed period 
does not have any major effect on the outcomes of the tests. The relatively 
most distinct changes compared to the presented results were detected in the 
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NCEP/NCAR data when the 1960s were chosen as the testing set instead of 
the years 1991-2000, especially in the southern hemisphere. This difference 
can probably be contributed to the variations in the amount of observational 
data, entering the reanalysis, as discussed below. The applied tests were all 
based on prediction one day ahead - with an increase of the lead time, non¬ 
linearity quickly weakened and it became undetectable for predictions more 
than approximately five days ahead, even in the regions where the nonlinear 
behavior was originally strongest. This is in good agreement with the fact 
that a deterministic weather prediction is impossible for too long lead times, 
regardless of the method. 

Most of the patterns of nonlinearity identified in the NCEP/NCAR re¬ 
analysis data were also found in the outputs of the HadCM3 model. From the 
perspective of applied nonlinear time series analysis tasks (such as statistical 
downscaling carried out by nonlinear methods), the fact that a climate model 
is able to reproduce the character of the observed data is encouraging. Still, 
from the results obtained for a single representative of global climate models, 
it is not possible to infer whether all existing climate simulations do behave 
in a similar fashion. It is interesting that the correspondence of the structures 
found in the NCEP/NCAR and HadCM3 data tends to be better in the north¬ 
ern hemisphere. Although this fact can at least partially be a consequence of 
the specifics of the model’s physics, it might also be contributed to the char¬ 
acter of the reanalysis data. To assess the possible influence of the specific 
properties of the NCEP/NCAR reanalysis, we repeated some of the tests for 
another commonly used griclded dataset based on observations, the ERA-40 
reanalysis [28]. Although some differences were found, the resemblance of the 
results from the NCEP/NCAR and ERA-40 data was generally strong in the 
northern hemisphere, but somewhat weaker in the southern one. This implies 
that caution is needed in interpretation of the model-reanalysis differences, 
particularly in the southern latitudes, as they may be a result of a limited 
amount of observational data used by the reanalysis (and possibly some other 
specifics of the NCEP/NCAR dataset), not just imperfections of the climate 
model. This especially applies to the period preceding the era of meteorologi¬ 
cal satellites - e.g., the amount of data entering the NCEP/NCAR reanalysis 
is very low before the year 1979 south of approximately 40°S [19]. 

We have shown that the direct comparison of prediction by linear regres¬ 
sion and by local linear models yields nonlinearity patterns very similar to 
the approach based on the application of local linear models for surrogate 
data. A practical advantage of the direct comparison lies in its speed, as there 
is no need for multiple realizations of a nonlinear model. This is especially 
convenient in the case of an analysis like ours, carried out for thousands of 
grid points and repeated for numerous settings. Another benefit of the direct 
comparison is that it provides specific information about the potential gain 
from employing a nonlinear method; its fundamental drawback is that such 
information may only be valid for the combination of the methods applied. 
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7 Conclusions 

By analyzing the series of selected atmospheric variables, we were able to con¬ 
firm the presence of systematic geographical and seasonal variations of nonlin¬ 
earity. Simple and unequivocal physical explanation of the results beyond the 
basic tropics/mid-latitudes and summer/winter contrast may be problematic, 
because the finer details of the detected patterns are probably a product of 
multiple influences and they are subject to the type of the predictand variable 
and some other factors. To find out whether any other general regularities exist 
would require a systematic analysis performed for a large number of variables 
and pressure levels. Regardless of the exact cause of the detected structures, 
their character was simulated fairly well by the HadCM3 model. From the 
practical perspective, this finding is rather promising, as it confirms that data 
produced by the current generation of global climate models can be utilized 
for the study of nonlinear properties of the climate system. 
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Abstract. We discuss concepts for the prediction of extreme events based on time 
series data. We consider both probabilistic forecasts and predictions by precursors. 
Probabilistic forecasts employ estimates of the probability for the event to follow, 
whereas precursors are temporal patterns in the data typically preceeding events. 
Theoretical considerations lead to the construction of schemes that are optimal 
with respect to several scoring rules. We discuss scenarios for which, in contrast 
to intuition, events with larger magnitude are better predictable than events with 
smaller magnitude. 


Keywords: Extreme events, Forecasting, Scoring rules, Receiver operating 
characteristic, Precursor 


1 Prediction of Events 

Geophysical processes are characterized by complicated time evolutions which 
are generally aperiodic on top of potential seasonal oscillations and exhibit 
large fluctuations. This applies to all processes related to or caused by the at¬ 
mosphere, but also is true for geological processes. The prediction of extreme 
events is of particular interest due to their usually large impact on human life, 
as exemplified by earthquakes, storms, or floods. For many of such processes 
no detailed physical models and also no useful observations to put into such 
models are available, such that their prediction is very often a time series 
task. But even in much more favorable situations where sophisticated models, 
sophisticated observations, and hence model based forecasts exist, extreme 
events pose challenges. Due to its immense relevance in all aspects of daily 
life, the weather has been subject to forecasts for centuries, on various levels of 
sophistication. Weather predictions are nowadays generated on a daily basis 
with the involvement of an enormous body of scientific results and computa¬ 
tional resources. This type of prediction is different though from the prediction 
of extreme events in one specific aspect: Weather predictions are designed to 
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perform well under a large variety of “typical” situations, most of which are 
not extreme in any sense. Hence, a prediction scheme which works excellently 
on average might completely fail in rare but extreme situations. Indeed, there 
have been situations in recent years where public warnings of extreme weather 
situations turned out to be inadequate, such as with the Great Strom of Oc¬ 
tober 15/16, 1986 in South England [1, 2] or the extreme precipitation event 
in Saxony on 12/13 August, 2002, leading to floodings of the river Elbe and 
its tributories. Both events were misplaced or overlooked by medium range 
weather forecasts [3]. In this chapter, we discuss the predictability and predic¬ 
tion schemes for extreme events, based exclusively on time series analysis. We 
are not employing any prior knowledge about physical processes or models 
for the phenomenon under study, but rely only on recordings of past data. 
For weather prediction over lead times larger than a few hours, this approach 
would not be too reasonable, as the atmospheric phenomena governing the 
evolution of weather are fairly well understood, and physical models have 
been demonstrated to yield useful forecasts over a large spectrum of spatial 
and temporal scales. In a variety of other circumstances though, a time series 
approach is the only possibility for predictions at all. A frequent reason is that 
good models for the specific situation are not available. But even if, to gen¬ 
erate a useful forecast, some state variables have to be fed into such models, 
which in turn have to be estimated from (often rather incomplete and noisy) 
measurements, which often requires more than reasonable effort. 1 

In the time series setting, we will assume that the unknown dynamics 
is a nonlinear and stochastic process, which can generally be described in 
a nonparametric and data driven way by all its joint probabilities. An even 
stronger assumption is that the dynamics is a generalized Markov process, 
which can be described by a finite set of transition probabilities (rather than 
the infinite set of adjoint probabilities). For the present work, this assumption 
is only important in so far as the envisaged prediction methods are suboptimal 
if the process is not Markovian. The stochastic character of the processes 
under concern suggest that the forecasts are probabilistic. In other words, our 
schemes will provide us with probabilities of an event to come. We will argue 
that probabilistic predictions will require performance measures which are 
different from standard ones such as the root mean squared prediction error. 
As a possible alternative, the concept of scores provides measures to quantify 
the success of probabilistic forecasts. We will discuss and employ two popular 
examples, the Brier score and the Ignorance score. A problem with scores in 
connection with extreme events is that the average score over many forecast 
instances is taken as a quality measure for the forecasting system. Since by its 
very nature the base rate of an extreme event is very small, there are very few 
instances where having a good forecast actually makes a difference, whence 


1 This problem, known as data assimilation, takes about 50% of the total CPU 
time required to generate a medium range (=10 days) weather forecast [4]. 
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the score of an excellent forecast is on average only marginally better than 
the score of a just mediocre forecast. 

A probably much more suitable scoring scheme for our purpose is the Re¬ 
ceiver Operating Characteristic (ROC). An important observation is that both 
the scores and the ROC encourage essentially similar forecasting strategies, 
namely to use the probability of the event given the previously observed time 
series. A simple but very effective approximation of this conditional probabil¬ 
ity leads to a scheme using precursors of extreme events, an approach which 
can be motivated independently. We discuss the performance of our prediction 
schemes for simple model processes, and verify these findings by predictions 
of turbulent wind gusts and large fluctuations in laboratory turbulence. As a 
striking result, we find that under certain circumstances, events are the better 
predictable the more extreme they are. 


2 Time Series Data and Conditional Probabilities 

Suppose we are given a time series {xi,... ,Xn} of N data points, which are 
evenly distributed in time, where N is called the sample size. The time series 
can be vector valued, but often is scalar. Regardless of what process and what 
measurement function creates the data, we will interprete this sequence as 
being generated by a stochastic process. As is well known [5, 6], a stochastic 
process is fully characterized by all its joint cumulative probabilities, P(xj 1 < 
9\,Xi 2 < 02,...,Xi k < 9k) for all k and all possible sequences of indices 
ij'j = 1 ...k. For simplicity we will assume the process to be stationary, 
which implies that all joint probabilities depend only on times relative to the 
time of the first argument, such that we can set i\ = 0 always. Moreover, in 
the following, we will order the time indices in descending order, the indices 
further right referring to times further in the past. 

Stationarity is a property which almost never applies to realistic processes 
such as atmospheric turbulence. Applying concepts from stationary processes 
to data which might originate from a non-stationary process could result in 
reduced performance of our prediction algorithms, but is not a fundamental 
problem for the examples of non-stationary data 2 which we study in this 
contribution. Moreover, the methods proposed in this contribution should also 
be suitable in the special case of non-stationarity due to slowly varying system 
parameters, as was argued in [7]. In terms of the prediction of wind speeds in 
high frequency wind speed data, those slowly varying system parameters are 
related to changing weather conditions or changes of the time of the day. 

From a joint probability one easily computes conditional probabilities, 
which denote the probability to find a specific value for the variable X/p if the 
values for the past variables xo, x_i,..., Xi_ T are given. Joint and conditional 
probabilities are connected by the well known Bayes’ theorem 

2 Data are called non-stationary if the null-hypothesis of stationarity can easily be 
rejected. 
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p{x A |x 0 ,x_i,...,Xi_ T ) := p(x^,x 0 ,x_ 1: ..., xi_ r )/p(x 0 , x_i,... ,Xi_ T ). 


(1) 


Note that Bayes’ theorem and many other expressions in this contribution 
could also be formulated in terms of probability densities. Since the distinction 
between probabilities and probability densities is not of relevance, as far as 
numerical estimates of probability densities are involved, we formulate the 
correspondent expressions mostly in terms of probabilities and only refer to 
probability densities, when analytical considerations are involved. 

Conditional probabilities provide the information needed for (probabilis¬ 
tic) predictions: Knowing p(x A \xo, x_i, . . ., xi_ T ) as a function of Xo,x_i, 
..., Xi- T , and given specific values for the variables, one can calculate the 
probability that the observation at A time steps in the future will fall into a 
given interval. Generally, the probability density function or probability mass 
function of x A will be the sharper, the further into the past the conditioning 
extends. Ideally, the entire past of the process would be observed and the con¬ 
ditional PDF for infinite conditioning would be known, thus yielding optimal 
knowledge of the future (which does not mean that this conditional proba¬ 
bility necessarily becomes sharply peaked as a 5-function). In practice this is 
absolutely out of reach. The practical difficulty here is to estimate the condi¬ 
tional probability from the sample of N data points, as this estimate gets the 
worse the larger r. If the observed time series were governed by a generalized 
Markov process of order r 0 , then the r 0 -step conditioning would be optimal 
and any additional conditioning would not improve (or in fact change) the 
forecast. In general, although the process is not Markovian, finite condition¬ 
ing still provides a rather good approximation to infinite conditioning, or in 
more colloquial terms, there is nothing wrong about basing one’s predictions 
on finite r-conditioning, the worst to happen is that this is sub-optimal. 

So far we have been discussing p{x a \xq, x -ii • • •, xi_ T ). When we want to 
predict the occurrence of events, one could in principle make a prediction of 
x A , and then derive from the value of x A whether this value fulfills the crite¬ 
rion for an event to follow or not. However, simpler, faster, and more general 
is the following approach: We consider a second time series {yq, \ 2 , ■ ■ ■, X«}, 
Xi G { 0 , 1 }, which is the event time series, where Xi = 1 means that an event 
takes place. In many applications, this series is derived from the original time 
series {xi,...,xjv}, for example by defining 



( 2 ) 


if the event under study is defined as a crossing of a given threshold ??, or 



1 x i+ i - Xi>rj, 
0 x i+ i - Xi < 77 , 


(3) 


if the event is defined as an increment Xi+i — Xi larger or equal to 77 . However, 
the events could also be defined using the observation of some other quantity. 
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In fact, the only important requirements are that at the time when the forecast 
has to be made, is unknown, and that its actual value is revealed later. 
Since the relative time distance A between the last observed value x 0 and the 
event is irrelevant for the following discussion, we shift the time indices of the 
event series in such a way that the event with time index i is to be predicted 
from the observation sequence terminating with the time index i, regardless 
of how far into the future the prediction is made. 

In the following we will concentrate on events which are extreme. For our 
purposes we specify that we understand as an extreme event an event which 
is rare, which is recurrent, and to which we can assign a magnitude 77 , which 
assumes a large value if the event takes place . 3 

Using then both time series, one can construct the joint probabilities 
p(X 0 ) x o, X-i, • • ■, x i-t) which contain all dependencies between the sequence 
of observations down to r temporal steps into the past. By the stationar- 
ity assumption this joint probability is equal to p(Xii x ii x i-i, ■ • •, x i+i-t) 
for any i. The prediction schemes to be discussed in the following will ex¬ 
ploit such joint probabilities, which themselves will be estimated from the 
data records. We will abbreviate a vector of r successive time series elements 
(xi, Xi- 1 ,..., Xi- T +i) =: X;. The explicit value of r is suppressed in this nota¬ 
tion. Also note that strictly speaking, p(xi -> x *) is a probability in the argument 
Xi, but a density in the argument x^. The interpretation is that for any volume 
V in R fc , 

P(x» = IjX, e V) = [ p(xi = l,*i)dxi. (4) 

Jv 


3 Probabilistic Forecasts and Prediction 
through Precursors 

Assuming the process which generates the observations and the events to 
be stochastic calls for probabilistic predictions. Such predictions consist of 
random variables p,, called forecast probabilities , which are issued at time i. 
If the r-dimensional vector x, is used to represent the current state of the 
process, then the forecast probabilities are a function p(x,) of x, with values 
between zero and one. The function pfxf) is called a probabilistic predictor. 
Intuitively, one would hope that p(xj) gives the probability of Xi = 1 given 
Xj, or 

P(x») =p(Xi = !|xi). (5) 

We will see in Sect. 4 that many reasonable measures of forecast success 
support this intuition, that is, they give maximum possible scores if p(xj) 
indeed agrees with the probability of Xi = 1 given x, ; . 


3 Note that this is not a general definition, and that other people might understand 
the term extreme event in a different way. 
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A seemingly different way to motivate p(Xi = l|x,) as a good forecast 
probability is through reliability. Reliability means that on condition that the 
forecast (approximately) equals z, the event should occur with a relative fre¬ 
quency (approximately) equal to z, too. As an optimality criterion, reliability 
is not sufficient to single out a particular forecasting scheme, since any con¬ 
ditional probability of the form pi'Xi = l|-0 is reliable, independent of what I 
is. In particular, the unconditional probability p(x) = const. = P(y, = 1) is 
reliable as well. Hence, in addition to reliability, the forecast should feature a 
high correlation with the actual event. This property is known as sharpness. 
It can be demonstrated that p(xt = l| x i) is indeed the reliable forecast which 
features maximum sharpness among all functions of x^. As we will briefly dis¬ 
cuss in Sect. 4, it is in fact for the same reason that p( x i) = p(xi = l| x i) 
achieves optimal scores. 

If experimental data is to be investigated, the exact shape of p(xi = 1 | x i) 
(or any other probability, for that matter) is of course unknown and has to be 
estimated from data. There exist many sophisticated algorithms to approxi¬ 
mate conditional probabilities. These algorithms, although of great value in 
the analysis of time series, often result in exceedingly complex models for 
p(Xi = 1 | x z) and are therefore of limited use in real time implementations, 
where simple and fast algorithms are required. 

An intuitive and indeed widespread approach to the prediction of extremes 
is to search for precursors (e.g., [8, 9]). A precursor is a pattern in the time 
series, i.e., a sequence of r values (xo, • ■ •, Xi- r ) which “typically” precedes 
an event. In the following, precursors will be denoted by u := (uo,..., rti_ T ). 
The assumed stochastic nature of the process implies that there are events 
which are not preceded by a sequence of observations which are similar to 
the precursory pattern, but that there are also incidents where the sequence 
of observations is very similar to the precursor, though no event follows. The 
prediction by precursors requires first to choose one or more precursory pat¬ 
terns (after fixing the parameter r). We are going to address the issue of how 
to identify such precursors at the end of this section. Suppose for the moment 
that we had already chosen a precursor u in one way or another. Next, we 
define an alarm volume V (<5, u) around each precursor as the set of all Xj for 
which ||xi — ujj < S , where || • j| denotes a norm which can be, for example, 
the Euclidean norm or the maximum norm. When using the maximum norm, 
the alarm volume consists of all time series segments which fall into a (5-tube 
around the precursory pattern u. The challenge in this approach is to deter¬ 
mine good precursory structures, since they are the core of this prediction 
scheme. Two approaches have been studied, both of some intuitive appeal: 

Strategy I: After having collected all events Xi = 1 from the recorded data, 
one studies what typically happens before these events. This leads one 
to study the conditional probability p(~x.i\xi = 1). A reasonable way to 
extract a precursor u is to ask for u to maximize this probability. Then, 
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the precursor represents the time series pattern which is most probably 
observed before an event takes place. 

Strategy II: Alternatively, one can look through all possible precursory pat¬ 
terns x and define u to be the pattern for which p(xi = l| x i) is maximal, 
that is, the pattern which has the largest probability to be followed by an 
event. 

Note that the conditional probability used for strategy II is the conditional 
probability which was suggested for probabilistic forecasting at the beginning 
of this section. 

Strategy I might seem a very intuitive approach to look for precursory 
structures. However, the considerations in Sect. 4 and the results in Sect. 5 
will show that p(Xi = 1 | x i) is in some sense the optimal prediction scheme. 
As strategy II essentially approximates this conditional probability, the per¬ 
formance obtained by identifying precursors with strategy I is expected to be 
worse or equal, but not better than with strategy II. 


4 Scoring Schemes 

In this section, the question of how to quantify the performance of forecasts 
is addressed. Performance measures are important not only in order to rank 
existing forecast schemes but also in the design of such schemes, for example 
the tuning of free parameters. Measuring the success of predictions in terms of 
how “close” they eventually come to the truth is a paradigm which presumably 
requires no further motivation. The (root) mean squared error, briefly revisited 
in Sect. 4.1, is just one among many possible variants of this paradigm, albeit 
a very important and popular one. If we envisage to formulate our forecasts in 
terms of probabilities though, the paradigm cannot be applied readily without 
modification, as the notion of “distance” between forecast and truth ceases 
to be meaningful. But probability forecasts essentially quantify how likely 
a given potential event will come true, thus already providing a sort of self 
rating. Hence it seems reasonable to value the success of a probability forecast 
in terms of how confident the forecast was of the event which eventually 
occurred, in relation to other events which did not. This idea is implemented 
in the concept of scores, explained in Sect. 4.2. 

A third approach to measuring the quality of probabilistic forecast is the 
Receiver Operating Characteristic (ROC), presented in Sect. 4.3. Different 
from scores, the ROC, albeit taking the probabilistic character of the forecast 
into account, is insensitive to the reliability of the forecast. 

4.1 RMS Error 

When predicting future values of some time series, a commonly used criterion 
to quantify the success is the root mean squared (RMS) prediction error: Let 
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x, be a prediction generated by some algorithm and x* the observation, then 
one defines the RMS error as 

1 N 

e= — ^](xi-Xj) 2 . (6) 

i=1 

The same quantity can be computed for predictions Xi of the event variable 
Xi, where both Xi and Xi can only assume the values 0 or 1. In our context, 
where Xi = 1 is rare, such a scoring will favor predictions schemes which pre¬ 
dominantly ignore the occurrence of events at all. Assume that Xi denotes the 
occurrence of an earthquake of large amplitude on day i. It is evident that 
a predictor which does never predict the earthquake to come will fail only 
once during very many years, thus yielding an almost zero RMS error. In the 
uniform average over time, mis-prediction of rare events is simply averaged 
out. Also, the RMS error relies on the norm of the difference between predic¬ 
tion and future value and is therefore symmetric, i.e., the “costs” implied by 
a false alarm are identical to missing a hit. When discussing extreme events, 
such costs are usually very different, but their quantification usually is an art 
in its own, so that no values for these costs are available. This suggests that 
a better scoring should consider these two types of errors, namely false alarm 
and missed hit, separately. 

4.2 Brier Score and Ignorance 

A scoring rule [10, 11, 12, 13] is a function S(p,z) where p £ [0,1] and z is 
either zero or one. If p is the true forecast and Xi is the corresponding event, 
then S (p, Xi) quantifies how well p succeeded in forecasting Xi- A scoring 
rule effectively defines two functions S(p, 1), quantifying the score in case the 
forecast is p and the event happens, and S(p,0), quantifying the score in 
case the forecast probability is again p but the event does not happen. Two 
important examples are the Ignorance score, given by the scoring rule 

S(p, Xi) ■= ~ log (p) ■ Xi ~ log(l - p) ■ (1 - Xi), (?) 

and the Brier score, given by the scoring rule 

S(p,Xi ) := (Xi-P) 2 = (1 -P) 2 ' Xi+f ■ (1 ^ Xi)- (8) 

These definitions imply the convention that a smaller score indicates a better 
forecast. 

A score is a “point-wise” (evaluated at every single time instance) measure 
of performance. It quantifies the success of individual forecast instances by 
comparing the random variables p and x pointwise. The general quality of 
a forecasting system (as given here by the random variable p) is commonly 
measured by the average score E[S(p,Xi)], which can be estimated by the 
empirical mean 
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1 

E I s (p, Xi)] = Xi) ( 9 ) 

»=i 

over a sufficiently large data set (pi,Xi)- 

The rationale behind the two mentioned scoring rules, the Ignorance and 
the Brier score, is rather obvious. If the event occurs, the score should become 
better (i.e. decrease) with increasing p, while if it does not occur, the score 
should become worse (i.e. increase) with increasing^. But why then not taking 
just 1 — p if the event occurs, and p if it does not? To see the problem with 
this “linear” scoring rule, define the scoring function 

s(p, q) ■= S(p, 1 )-q+ S(p , 0) • (1 - q) (10) 

where q is another probability, that is, a number in the unit interval. Note 
that the scoring function is the score averaged over cases where the forecast 
is p but in fact q is the true distribution of In view of the interpretation of 
the scoring function, it seems reasonable to require that the average score of 
the forecast p should be best (i.e. minimal) if and only if p in fact coincides 
with the true distribution of x- This means that the divergence function (or 
loss function) 

d{p , q) ■= s{p , q) - s(q, q) (11) 

has to be positive definite, i. e. , it has to be nonnegative, and zero only if 
p = q. A scoring rule with the corresponding divergence function having this 
property is called strictly proper [12, 14]. The divergence function of the Brier 
score for example is d(p, q) := ( p—q ) 2 , demonstrating that this score is strictly 
proper. While the Ignorance is proper as well, the linear score though is easily 
shown to be improper. 

4.3 The Receiver Operating Characteristic 

The ROC [15] is a concept originating in signal detection, but it is applicable 
to any problem in which, based on some evidence, we have to decide whether 
a certain event will happen or not, for example if it will rain tomorrow or if an 
extreme wind gust will occur within the next minute. We assume the evidence 
to be a (rather general) random variable x. To stay with the wind example, the 
evidence used in this case could be the delay vector x, = (xj, Xj_i, ..., Xi- T + 1 ) 
of previous wind measurements. Suppose that r(x,) is a real-valued function 
of the evidence with the idea that a large r is indicative of an event, while a 
small r is indicative of a non-event. An r(x,) exceeding a certain threshold <5 
could be interpreted as signaling an impending event. The variable r(xj) will 
henceforth be referred to as the decision variable. Referring back to Sect. 3, 
if we use the precursor technique to forecast an event, the decision variable 
could be the (negative of) the Euclidean distance of the delay vector x, : to the 
(pre-defined) precursor u 
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r ( x ) := — ||xj — u||. (12) 

Giving an alarm if r(xj) > <5 is equivalent to giving an alarm if x> falls into 
the alarm volume Vs- Alternatively, (an approximation to) the conditional 
probability r(xj) = p(xi = l| x i) could be used as decision variable. 

The ROC curve for a certain decision variable r comprises a plot of the 
hit rate 

H(5):=p(r>6\xi = l) (13) 

versus the false-alarm rate 

F(S) := p(r > 5\xi = 0), (14) 

with S acting as a parameter along the curve. Alternative names for the hit 
rate are rate of true positives or the power of the test r > 6. Alternative 
names for the false-alarm rate are rate of false positives or the size of the 
test r > 5. It follows readily from the definitions that both H and F are 
monotonously decreasing functions of <5 with limits 0 for increasing S and 1 
for decreasing 5. Hence, the ROC curve is a monotonously increasing arc 
connecting the points (0,0) and (1,1). Furthermore, note that monotonically 
increasing transformations of the decision variable do not change the ROC at 
all, as is easily seen using the definitions of the hit rate and the false alarm 
rate. This is exactly the reason why, when using the precursor approach to 
predict events, it is already sufficient to specify the level sets Vs as a function 
of 5 , but not necessary to assign a probability value to each level set. A typical 
ROC curve is shown in Fig. 1. 

The obvious question is of course as to when a ROC curve should be 
considered good. Arguably, a decision variable t*i should be taken as superior 
to another decision variable r 2 , if for any fixed false-alarm rate F , the hit rate 
Hi of 7*1 is equal or larger than the hit rate H 2 of r 2 . If this is the case, we will 
refer to 7*1 as being uniformly superior to r 2 . It can be demonstrated that the 
decision variable p{xt — 1 | x i) is uniformly superior to any decision variable of 
the form r(x^) (this follows from the Neyman-Pearson-Lemma [16] and the 
fact that p(xi = l|x, ; ) is a monotonically increasing function of the likelihood 
ratio, see Eq. (22)). 

As a consequence, the decision variable p(xi = l| r ) is uniformly superior 
to any transformation </>(r) (in particular, r itself), r is a function of x,, then 
p{Xi = l| x i) is still uniformly superior to p(xi = l| r ( x i))- An easy calculation 
will reveal that the slope of the ROC curve is a monotonically increasing 
function of p(xi = l| r )i so replacing the decision variable r with the slope 
of the ROC curve at r is an alternative way of getting the decision variable 
p(Xi = I/7*). If the ROC curve is concave though, then the slope (as a function 
of r) is monotonically increasing, thus using it as a new decision variable does 
not alter the ROC plot. We can conclude that a concave ROC is already 
optimal in that it cannot be any further improved by a transformation of the 
decision variable. 
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If we have to compare two arbitrary decision variables r± and r 2 , then 
the notion of “uniformly superior” is not so useful, as the two ROC curves 
might cross. This is a problem if a criterion is required in order to optimize 
a prediction algorithm, in particular, when we search for optimal precursors. 
There is no reason why the ROC curves corresponding to any two predictors 
should not cross. Hence, summary statistics of ROC curves are needed, for 
example the following: 

Proximity to (0,1): A good ROC should be close to the point (0,1), that is 
where the false-alarm rate is zero while the hit rate is 1. The point closest 
to (0,1) would simultaneously define an operation point for the algorithm. 
Area under ROC curve: The area under the ROC curve (AUC) is a well es¬ 
tablished summary index for ROC curves, which should be maximal. It 
can be shown that this quantity gives the probability that on an instance 
when the event takes place, the decision variable is actually larger than 
on an independent instance when the event does not take place. It is a 
global quantity, averaging over all alarm rates. 

Maximal hit rate for fixed alarm volume: Optimizing ROC for precursors by 
asking for a maximal hit rate without any further constraints is not a 
useful criterion, since all decision variables have a maximum hit rate of 
1, achievable by just giving always alarms. Fixing the alarm volume, this 
criterion leads to precursors according to strategy I of Sect. 3. Note that 
the false alarm rate is not considered at all in such an optimization, so 
that the optimal hit rate for fixed alarm rate might be achieved at the 
cost of an unreasonably large false alarm rate. Inverting the criterion to 
minimizing the false alarm rate for fixed alarm volume leads to the same 
precursor. 

Ratio of hit rate and false alarm rate: A maximum ratio of hit rate versus 
false alarm rate in the limit of small false alarm rates yields another well 
established summary index, the slope of the ROC curve at the origin. If 
the ROC is concave, this is the same as the overall maximum ratio of hit 
rate versus false alarm rate. Maximizing this summary index leads to the 
prediction scheme called strategy II in Sect. 3. 

It should be noted that an uniformly superior decision variable is superior 
with respect to any mentioned summary index, but not vice versa. A decision 
variable might for example have a larger AUC than another, but still their 
ROC curves might cross, in which case neither of the two is uniformly superior 
to the other. 

4.4 Applying the Scoring Schemes for the Prediction 
of Extreme Events 

As has been argued in the context of the RMS error, the Brier score might 
have its shortcomings if applied to forecasts of very unlikely events. If the 
overall probability of the event is very small, a forecast which successfully 
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separates events from non-event gets little credit over a forecast which plainly 
states that the event will not happen at all. A more formal analysis shows 
that this is due to the loss function of the Brier score depending only on the 
difference between the forecast and the correct probability. 

The ignorance in contrast severely punishes erroneous forecasts close to 
both one or zero. In fact, forecasting zero probability for an event which 
actually occurs, or holding an event for certain which then fails to materialize 
yields a score of infinity. Hence the ignorance might be a more appropriate 
score for extreme event forecasts, albeit a rather harsh one. 

The ROC avoids a direct dependence on the overall probability of events 
and non-events by definition. 


p(r >5,Xi = 1) 

p{.Xi = 1) 
p{r >S,Xi = 0) 
p{Xi = 0) 


H(8) := p(r > 8\xi = 1) 


(15) 


F{5) := p(r > 5\xi = 0) 


(16) 


Since the cumulative probabilities p(r > 8\xi = 1) and p(r > 8\xi = 0) 
of giving an alarm and observing an event (non-event) are normalized with 
the total probability to find events (non-events) the rates do not depend ex- 
plicitely on the total probabilities. However one cannot exclude an implicit 
dependence which is given through the relation between precursor and event 
or the definition of the events. 

5 Performance of the Prediction Schemes 

For the purpose of precise understanding, the two different prediction strate¬ 
gies of the precursor based prediction schemes were evaluated for an extremely 
simple time series model, an AR(1) process, Xi+± = ax n + with the cor¬ 
relation coefficient a and the sequence of normal distributed i. i. d. random 
numbers £i, in [17] and [18]. As events x-h we considered both threshold cross¬ 
ing, Xi > r /, and large increments, Xj+i — Xi > 77 . Due to the short- range 
correlation of the process, we used only the last value Xi as evidence, i.e., 
t = 1 the vector x, ; reduces to x^. Correspondingly, the precursor consists of 
the scalar value u and the alarm volume V(8, u) becomes an alarm interval 
1(8, u). This setting allows us to compute all relevant expressions analytically 
for the example of the AR(1) process. We can then compare the different 
prediction strategies by creating ROC-curves. This was done by expressing 
the hit rate H(8) and the rate of false alarms F(8) in terms of probability 
densities, 



( 17 ) 
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The values of the precursor u were taken to be either the maximum of 
p(xi\Xi = 1) (strategy I) or of p(xi = 1| x,) (strategy II). 

5.1 Generation of ROC-Curves 

One can obtain numerical estimates for the conditional probabilities p(x\x% = 
1) and p(Xi = l|xj) by using a kernel estimator or by “binning and counting”. 
Since sufficiently many data points were available for all stochastic processes 
studied in this section, we used the latter numerical method in order to com¬ 
pute the ROC-curves shown in Figs. 1 and 2. 4 In the next step we identified 
the values of x, which p(xi\xi = 1) and p(xi = l|xi) are maximal and used 
them as precursors it/ and it//. We can then define alarm intervals /(it/,<5) 
and J(it//,<5) and (count for each value of 6) how many values of our process 
are within this interval and how many of the created alarms actually were 
followed by an event. Both the analytical and the numerical results for the 
AR(1) process were in good agreement and are displayed in Fig. 1. 


5.2 Comparing Different Strategies of Identifying the Precursor 

The first essential finding is that, consistent with our theoretical investiga¬ 
tions, strategy II is uniformly superior to strategy I, for arbitrary parameters 
of the AR(1) process and for all event sizes. Figure 1 illustrates this result, 
which is in good agreement with the theoretical considerations in Sect. 4.3 on 
optimal ROC curves obtained by using p(xi = l|x/) as decision variable. 

We will now try to obtain an intuitive understanding of the superiority of 
strategy II (finding a precursor which maximizes p{Xi = l| x i)) over strategy I 
(maximizing p{~x.i\xi = 1)) by investigating the slope of the ROC plot in 
the vicinity of the origin. We therefore assume again the more general case of 
multi-dimensional decision variable x and a more dimensional precursor u. As 
shown above, the hit rate H(S ) and the false alarm rate F(6) (for arbitrary 
precursor u) can be expressed by conditional probability densities p(xj|%j) 


H(S) = [ p(xi\xi = l)dxi, 

Jv(5, u) 

F(S) = [ p( x i|xi = 0)dxj, 

J VfTu') 


>V(S, u) 

and the slope m of the ROC curve is given by 

dH(6) 
m ~ dF\6)' 


(18) 


(19) 


For small alarm volumes V(5, u) 

4 In Sect. 6.1 we applied a box-kernel estimator for the same purpose. The details 
of the box-kernel estimator will be explained in Sect. 6.1. 
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rate of false alarms 


Fig. 1: Performance of strategies I and II for the prediction of increments generated 
by an AR(1) process with correlation coefficient a = —0.75. Different event magni¬ 
tudes rj = ( Xi+i — Xi)/a, with a being the variance of the process under study, are 
compared. 


H(S) « cS T p(u\xi = 1), F(S ) « c6 T p(u\xi = 0). (20) 


The geometry parameter c defines how the alarm volume scales with S T . In¬ 
serting Eq. (20) in Eq. (19), this factor cancels out. Hence the slope of the 
ROC curve in the vicinity of the origin is given by 


m 


P( u \Xi = 1 ) 
p(u\xi = 0 )' 


( 21 ) 


The right hand side is known as the likelihood ratio. Using Bayes’ Theorem, 
the right side of Eq. (21) can be written in terms of the conditional probability 
p(Xi = l| x ) and the total probability p{xi = 1) to find events: 


p(Xi = !| u ) P(Xi = !) 

(1 - p(xi = l|u)) (1 - p(xi = 1))' 


( 22 ) 


Note that the total probability to find events is determined by the pro¬ 
cess under study and does not influence the choice of the precursor. The 
specific precursor u which maximizes m is given by setting the derivative 
of m with respect to u equal to zero. One easily finds that this requires 
dp(Xi = l|u)/<9u = 0, a condition which is fulfilled by the u which maximizes 
p(Xi = l|u). This is exactly what we called strategy II before. Strategy I 
aims at maximizing p(u|yj = 1), but does not take the denominator of the 
likelihood ratio (see Eq. (21)) into account. Hence we have shown that in 
the vicinity of the origin strategy II is always superior or equal to strategy 
I. This corresponds to the considerations in Sect. 4.3 concerning the uniform 
superiority of predicting through the conditional probability p(Xi = l| x i)- 
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In the limit of small alarm volumes, the probabilistic forecast and the 
precursor based prediction according to strategy II are equivalent. However, 
for larger alarm volumes the influence of the specific structure of p(Xi = l|xj) 
leads to slightly different predictions, especially, when p(xi = l|xj) is not 
symmetric around its maximum or exhibits multiple maxima. Then the alarm 
volume V(5, u) in general does not match any level set of p(xi = l|xj). 

5.3 The Influence of the Event Magnitude 

As the second relevant finding we quote here results from detailed studies 
which show that for a large variety of processes, the larger the events to be 
predicted, the better is the corresponding ROC curve [17, 19]. 

The mentioned studies provide a deeper understanding of empirical ob¬ 
servations reported for the prediction of avalanches in models which display 
self organized criticality [20] and in multi-agent games [21]. Our investigations 
also led to a criterion as to whether larger events are better predictable than 
smaller or not, depending on the joint distribution of precursor and event [19]. 
With some restrictions imposed on the length of the data set, the criterion 
can be evaluated numerically for arbitrary time series data. 

We found some especially interesting results for the prediction of large 
increments. Analytical studies show that large increments are better to pre¬ 
dict in terms of the ROC, if the probability distribution of the process under 
study is Gaussian. For data following a symmetric exponential distribution, 
p{x) oc exp(— 7 |x|), the dependence on the event magnitude is less pronounced 
then in the Gaussian case, while for data whose distribution has a power-law 
tail, p{x) oc x ~ a , for x > 0 and a > 2, larger increments are harder to predict 
than smaller increments. This is illustrated in Fig. 2. It is intuitively clear 
that for short term predictions, only the short range correlation structure is 
relevant. Hence results for short range correlated processes can be qualita¬ 
tively transferred to processes of arbitrary correlation, as long as they exhibit 
the same joint distribution of event and precursor. We studied a class of long 
range correlated Gaussian data numerically and confirmed the improved pre¬ 
dictability for larger events also in this situation [19]. 

However, in the analogous study for the prediction of threshold crossings 
in AR(1) processes with Gaussian, approximately exponential and approxi¬ 
mately power-law distribution, we obtained qualitatively different results [18]. 
In contrast to the results for increments, threshold crossings were for all tested 
distributions the better predictable, the larger the specified threshold was. 


6 Application to Experimental Data 

6.1 The Influence of the Event Rate on the Brier Score 

To illustrate the concepts introduced in the previous sections, we will per¬ 
form predictions of wind speeds, with the aim of forecasting the occurrence of 
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rate of false alarms 


Fig. 2: Typical ROC curves for the prediction of large increments via strategy II, for 
different event magnitudes g = Xi +i — Xi/a, with a being the standard deviation of 
the process under study. Panel a\ predictions within normal i. i. d. random variables, 
panel b: predictions within symmetric exponential i. i. d. random variables, panel 
c: predictions within power law i. i. d. random variables, with exponent a = 4, 
panel d: predictions within a long range correlated Gaussian process. Each data set 
contained about 10' data points. 


particularly strong wind gusts, that is, of sudden increases of the wind speed. 
We used recordings of horizontal wind speed, sampled with 8 Hz at 30 m above 
ground at the Lammefjorcl measurement site [22]. We fixed a gust strength 
g and defined Xi = 1 if X i+A — Xi > g, where Xi,i = 1 ... N are wind speed 
measurements. The time horizon A between the prediction and the event cor¬ 
responds to the time through which the increment is defined. Since the data 
is strongly correlated, we chose a time horizon of A = 32 in order to observe 
sufficiently many large increments. The time horizon A = 32 corresponds to 
an increase of the wind speeds 4 s ahead in time. Various gust strengths g 
were considered, and we also compared different values for the conditioning 
r, that is, conditional probabilities p(xi = l|x,,:r,_i, ■ ■ ■ ,Xi- T + 1 ) f° r various 
values of r were considered as basis for our predictions. 

The following results are based on 10 6 prediction trials at equidistant 
times, where for every sequence x, = (xi, Xi-i, ..., Xj_ T -|_i) of r successive 
observations, the prediction was compared to the known value of Xi- All in¬ 
formation needed for the prediction is extracted from the same time series. 

The conditional probabilities p(y,;|x,;) can be either estimated by binning 
and counting or by a kernel estimator. The latter is more time consuming but 
less memory consuming and slightly more accurate, if the dimension r of the 
condition x, is large. We use a box-kernel of width e, hence 
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„ _ |xj ~ Xj|)6>(100 - \i - j\) 

Ef=i^- | x i-x j |)6»(100- |*-j|) 

The right hand side denotes the relative number of events following those vec¬ 
tors Xj which are in the e-neighborhood of Xj. The second 8- function acting on 
the time indices simply excludes all those time series elements from the esti¬ 
mation which are too close in time to the actual observation and hence might 
be correlated with it. This guarantees that we perform true out-of-sample pre¬ 
dictions, since thus all sample points with time indices j : \j — i\ < 100 were 
ignored. We expect those sample points to be highly correlated with x,, allow¬ 
ing to form good predictions. But since these sample points are not available 
in a real forecast situation, including them here would lead to overoptimistic 
performance assessments. For every trial, we numerically estimate the condi¬ 
tional probability p(x^) = p(xi = l| x i) with an adaptive kernel size, thereby 
ensuring a local sample size of at least 20 points. For these predicted prob- 
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Fig. 3: The Brier scores for wind gust prediction using as predictor p = p(xi| x i), 
compared to the constant predictions p = c and p = 0. The larger the magnitude g 
of the events, the smaller their rates c. The confidence intervals of the Brier scores 
are derived from ten disjoint samples of 10 s prediction trials each. The inset shows 
the scores again but reduced by the score c(l — c) of the predictor p = c. 
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abilities we compute the average Brier score as performance indicator. This 
indicator can be compared to the score of the constant prediction p = c, where 
c is the rate of events, c = JT Xi/N. This rate depends on the gust magnitude 
g and is very small if g is large. This simple predictor has an average Brier 
score of c(l — c). For small event rates c, it is also reasonable to use the trivial 
constant prediction p = 0, that is, to predict the event never to happen. This 
even simpler predictor has an average Brier score of c (only missed events 
get counted). In Fig. 3, the Brier scores of these three prediction schemes are 















52 


S. Hallerberg et al. 


shown as functions of the event rate. The skills of the two constant predictors 
p = 0 and p = c are fully determined by the event rate, that is, they are 
independent of any information about the future, as stored in the preceding 
time series elements, and are therefore independent of any temporal correla¬ 
tions in the data sequence. The dependence of these two predictors on the 
rate demonstrates that the absolute value of the Brier score (and hence its in¬ 
terpretation) depends on the event rate, which renders a comparison between 
prediction schemes for different event rates difficult. The remarkable finding is 
that the Brier score of the predictor p = p(j(i = l| x i), which at least in theory 
should not be inferior to the constant predictor p = c is only insignificantly 
better than the latter when the event rate is very small. In this situation, the 
score of the “null” predictor p = 0 is also almost as good. This demonstrates 
that the Brier score is not very useful when the event rate is small. The inset 
of Fig. 3 shows the same results but presented in a different way: We plot the 
difference between the scores of the predictors and the score of the constant 
rate predictor p = c. 



Pc 


Fig. 4: The Brier score and the rate of false alarms for gust prediction of wind 
speeds. The predicted probability p is converted into warnings whenever it exceeds 
the threshold value p c . For large values of p c , the scores for different gust strengths 
saturate at the respective event rates c. For small p c and small event rates c, the 
Brier score of these “filtered” probabilities is dominated by the false alarms. The 
horizontal line at y — 0.31 just shows that for the event rate c « 0.31 the Brier score 
has a nontrivial minimum around p c ss 0.45. 


In order to convert probabilistic predictions into warnings, one would in¬ 
troduce a threshold p c and predict a “filtered” probability p = 0(p — p c ) so 
that whenever the predicted probability is below p c , a filtered probability of 
0 is issued, and a probability of unity (or a warning) otherwise. An example, 
in which this method is applied is presented in Fig. 4. Inspecting Eqs. (8) and 
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(9) for these special probabilities p shows that the Brier score is the number of 
false alarms plus the number of unpredicted events, normalized by the number 
of prediction trials. If the warnings were given randomly with constant rate, 
then the Brier score would be a linear interpolation between the event rate 
(warning rate being zero: no hits) and one minus the event rate (warning rate 
unity: maximal number of false alarms). Numerical results of the score of p 
as a function of p c obtained for a set of 5 -values ranging from 0.5 to 3.5 are 
shown in Fig. 5. As suspected earlier, for small event rates, the Brier score is 
best if no alarms are ever given ( p c = 1 ), since then no false alarms are made, 
at the cost of missing the large but rather few events. For low rates and low 
thresholds p c ~ 0, the Brier score is almost identical to the false alarm rate, 
which is also shown. Only for the largest event rate c « 0.31 (gust strength 
g = 0.5), we find a nontrivial minimum of the Brier score with this scheme. 

Without using any additional scoring scheme, the findings discussed so 
far might suggest that large wind gusts are not predictable. The ROC plot 
obtained from the prediction scheme p = p(xi = l| x z) for the same range of 
threshold gust strengths is shown in Fig. 5. This plot clearly demonstrates 
predictive skill of this algorithm. We note two interesting observations: First, 
the larger the magnitude of the event (the larger 5 ), the better is the ROC 
curve. Second, when we increase the conditioning from t = 1 to r = 8 , the 
predictive skill in the range of small false alarm rates improves. Whereas the 
first finding is understood fairly well theoretically (see Sect. 5.3), the second 
one suggests nontrivial temporal correlations in the wind speed data. 

Although in this study we focused on the Brier score, we argue that other 
scores such as the ignorance would suffer from similar difficulties, in particular 
if predicted probabilities are to be converted into actual warnings. The reason 
for this is that the interpretation of forecast skill in terms of scores is difficult 
if they depend in a nontrivial way on the event rate of the process. The need to 
convert probabilistic predictions into alarms for practical purposes turns the 
prediction of rate extreme events into a classification problem. Classification 
problems though can be conveniently evaluated by the ROC statistics. For the 
remainder of the paper, we will therefore restrict ourselves to ROC analysis 
of prediction schemes. 

6.2 Prediction of Increments in a Free Jet Flow 

In a last step, we study the influence of the event magnitude in experimen¬ 
tal data. This is done by applying the prediction scheme called strategy II 
to experimental data with the aim of predicting increments. The potential 
complications are that stationarity is violated to a smaller or larger extent, 
that the correlation structure is more complicated, and that the distribution 
is only approximately one of the above studied classes. 

We start with data from a well controlled laboratory experiment, namely 
from free jet turbulence [23]. Using hot wire anemometry, the velocity of the 
air in front of a nozzle is measured at a sampling rate of 8 Hz and at a position 
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Fig. 5: The ROC statistics for the prediction of large increments of wind speeds 
(wind gusts) for different gust strength using the estimated values p = p(x*l x »)- On 
the right the lower bundle of curves refers to r = 1, i.e., x, has only one component 
(the most recent observation), whereas the upper bundle represents predictions where 
the condition x; is the vector of the r = 8 last observations. In this example, the 
predictive skill improves when the condition is extended. Every bundle of curves 
represents gusts strengths ranging from g = 0.5 to g — 3.5, from bottom to top. On 
the left the influence of the event size is studied for r = 1. One can notice the better 
predictability of larger and thus rarer events. In both ROCs the overall performance 
is rather poor (all lines fall close to the diagonal), but the larger the gust strength, 
the better the predictability. Predictability can be further improved by redefining 
wind gusts to be large increments occurring in future time intervals rather than at 
a specific instance in time. 


where the flow can in good approximation be considered as being isotropically 
turbulent. Taking increments di of such a sequence vt over short time inter¬ 
vals, di = vt+k — Vi, for k small, yields approximately a symmetric exponential 
distribution for di, whereas for long time intervals, i.e., large k, the distribu¬ 
tion of the increments is approximately Gaussian 5 [24, 25], see Fig. 6. The 
time horizon A between the prediction and the event corresponds to the time 
through which the increment is defined. Since the data is strongly correlated 
we choose a time horizon A = 285, which corresponds to an increment 35.625 
s ahead in time, in order to observe sufficiently many large increments. Note 
that the increment size is defined again in units of the standard deviation a 
of the process under study, i.e., rj = (<Zj + /i — a,)/cr. 

In Fig. 7, we show the ROC statistics for the prediction of large increments 
di+A ~in the increment time series {a^}. The ROC statistics were generated 
according to the algorithm described in Sect. 5.1. 


5 For longitudinal velocity increments, one wing of the distributions is higher than 
the other. This effect can be understood via Kolmogorovs four-fifths law, which 
demands a non-zero skewness of the velocity increment [26]. 
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As predicted by the theoretical considerations for symmetric exponentially 
distributed i. i. d. random numbers, the dependence on the event magnitude 
is less pronounced, if the distribution of the data is approximately exponential 
which corresponds to small values of k. In contrast, in the case of large k the 
probability distribution follows approximately a Gaussian distribution and 
larger increments are significantly better predictable. Both predictions were 
made by determining precursors in the first part of the data set (7 • 10 6 data 
points) and then predicting increments in the second part of {a{\ (also 7 • 10 6 
data points). 



Fig. 6: Histograms for velocity increments ai = Vi+k — Vi in a free jet flow. For 
small increments, e.g., k = 3, the pdfs of the increment time series {ai} follow 
approximately an exponential distribution, for larger k the pdfs of the increments 
are approximately Gaussian distributed. 


6.3 Prediction of Increments in Wind Speeds 

As a second example, we study wind speeds measured from a measurement 
site about 66 m above ground at a sampling rate of 1 Hz. These data reflect the 
full complications of field measurements, including non-stationarity and inac¬ 
curacies, but also represent a much more complicated turbulent state, namely 
boundary layer turbulence which is strongly affected by the interaction of the 
air flow with the earth’s surface. Hence the deviations from the asymptotic 
distributions are larger than in the laboratory experiment, expecially in the 
tails of the distributions, as it is shown in Fig. 8. 

We predict large increments in the acceleration of the wind, so called tur¬ 
bulent gusts which are of relevance for controlling wind turbines or scheduling 
aircraft take-off and landing. The time horizon A between the prediction and 
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Fig. 7: Two ROC curve for the prediction of increments A = 285 time steps ahead of 
the increment time series ai of isotropic turbulence: For increments defined by short 
time intervals (left panel), the predictability is almost independent of the event 
magnitude, whereas for increments defined by large time intervals (right panel), 
larger events are better predictable. 



Fig. 8: Histograms of velocity increments in wind speed. Again, we find that on 
short scales the pdfs of the increments follow approximately an exponential distri¬ 
bution, whereas the increments are approximately Gaussian distributed for larger 
k. However, the deviations from the asymptotic distributions are larger than in the 
laboratory experiment, expecially in the tails of the distributions. 
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the event corresponds to the time through which the increment is defined. 
Since the data is strongly correlated we choose a time horizon A = 35, which 
corresponds to an increment 35 s ahead in time, in order to observe sufficiently 
many large increments. Note, that the increment size is defined again in units 
of the standard deviation a of the process under study, i.e., rj = {ai+^ — a^/a. 
Again, the predictions were made via identifying the precursors in the first 
part of the data set and then predicting in the second part. The ROC curves 
(Fig. 9) show again a better predictability of larger events if the data set {a^} 
is asymptotically Gaussian distributed and a much weaker dependence on the 
event size in the asymptotically exponential case. 




rate of false alarms rate of false alarms 


Fig. 9: ROC statistics for the prediction of large increments of wind speeds for 
different event magnitudes, predicting A = 35 time steps ahead. 


7 Conclusion 

We presented an overview over some aspects of the predictions of rare events 
Xi in time series and showed how to make use of the properties of the process to 
construct predictors. As a general result, the prediction scheme should exploit 
the conditional probability p(xi = l| x *) and not p( x »lx» = 1)- In practice, a 
prediction from p(xi — l| x i) can t> e either drawn by using the probability itself 
or by using the values of x., for which p{xi = l|xj) is maximal as precursory 
structures. 

Furthermore we discussed the role of several scoring schemes with respect 
to their performance for the prediction of extreme and thus rare events. We 
find that the RMS and the Brier Score are not the optimal scoring schemes for 
the prediction of rare events, since they involve an averaging over the whole 
number of prediction trials. Hence, the influence of the correctly predicted rare 
events is suppressed due to the influence of the large number of non-events. 
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The ignorance tries to avoid this effect by taking the logarithm of the 
forecast probabilities. Also the ROC statistics is particularly suitable for the 
prediction of rare events, since it does not explicitly depend on the rate of 
the event, provided that there are sufficiently many events to create ROC 
statistics. 

Another aspect which we addressed is the dependence or the performance 
on the magnitude of the events under study. One can show that for stochas¬ 
tic processes, this dependence is mainly given by the underlying probability 
distribution of the process under study. This leads to particularly interesting 
results for the prediction of increments. If the probability distribution of the 
process under study is Gaussian, larger increments are the better predictable, 
the larger they are. If the probability distribution is exponential, the depen¬ 
dence on the event magnitude is less pronounced than in Gaussian case and 
for power-law distributed processes we find that larger increments are the 
harder to predict, the larger they are. 

The corresponding results for the prediction of threshold crossings are 
qualitatively different. We find for processes with Gaussian, exponential and 
power-law probability distribution that larger threshold crossings are the bet¬ 
ter predictable, the larger they are. 

By applying the previously developed concepts for the prediction of incre¬ 
ments in the acceleration of a free jet flow and in wind speed measurements, 
we showed that the dependence on the increment magnitude, which was previ¬ 
ously described and theoretically understood for stochastic processes, can be 
also found in experimental data which exhibit non-stationarity and long-range 
correlations. 
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Abstract. Discrete wavelet transforms (DWTs) are mathematical tools that are 
useful for analyzing geophysical time series. The basic idea is to transform a time 
series into coefficients describing how the series varies over particular scales. One 
version of the DWT is the maximal overlap DWT (MODWT). The MODWT leads to 
two basic decompositions. The first is a scale-based analysis of variance known as the 
wavelet variance, and the second is a multiresolution analysis that reexpresses a time 
series as the sum of several new series, each of which is associated with a particular 
scale. Both decompositions are illustrated through examples involving Arctic sea ice 
and an Antarctic ice core. A second version of the DWT is the orthonormal DWT 
(ODWT), which can be extracted from the MODWT by subsampling. The relative 
strengths and weaknesses of the MODWT, the ODWT and the continuous wavelet 
transform are discussed. 


Keywords: Arctic sea ice, Haar wavelet, Ice cores, Maximal overlap discrete 
wavelet transform, Multiresolution analysis, Wavelet spectrum, Wavelet 
variance 


1 Introduction 

The wide-spread use of wavelets to analyze data in the geosciences can be 
traced back to work by Morlet and coworkers [1, 2] in the early 1980s. Their 
efforts were motivated by signal analysis in oil and gas exploration and re¬ 
sulted in the continuous wavelet transform (CWT). Work in the late 1980s 
by Daubechies, Mallat and others [3, 4, 5, 6] led to various discrete wavelet 
transforms (DWTs), which are the focus of this article. While CWTs and 
DWTs are closely related, DWTs are more amenable to certain types of sta¬ 
tistical analysis, making them the transform of choice for tackling certain - 
but not all - problems of interest in geophysical data analysis. The intent of 
this article is to give an overview of how DWTs can be used in the analysis 
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of geophysical time series, i.e., a sequence of observations recorded over time 
(usually at regularly spaced intervals such as once per second). 

The remainder of this article is structured as follows. In Sect. 2 we review 
the important notion of scale and the basic ideas behind the maximal overlap 
DWT (MODWT). The MODWT leads to two basic decompositions. The first 
(the subject of Sect. 3) is a scale-based analysis of variance known as the 
wavelet variance (or wavelet spectrum). The second (Sect. 4) is an additive 
decomposition known as a multiresolution analysis, in which a time series is 
reexpressed as the sum of several new series, each associated with a particular 
physical scale. In Sect. 5 we discuss another form of the DWT known as the 
orthonormal DWT (ODWT) that can be extracted from the MODWT and 
that has certain strengths and weaknesses in comparison to the MODWT. 
Our overview concentrates on the so-called Haar wavelet, but we note the 
existence of other wavelets in Sect. 6 and discuss why they might be preferred 
over the Haar wavelet for certain types of analyses. Finally we make some 
concluding comments in Sect. 7, including a comparison of the strengths and 
weaknesses of DWTs and CWTs. 


2 Maximal Overlap Discrete Wavelet Transform 

Let X n , n = 0,1,..., IV — 1, represent the nth value of a time series that has 
TV values in all. We assume that, for all n, the time at which X n was observed 
can be expressed as t^ + n A, where to is the time associated with Xq, and A 
is the sampling interval between any two adjacently recorded values X n and 
X n+ i. Given Tj = 2 J_1 for some positive integer j, consider 



(1) 


which is the average of Tj adjacent values of the series starting with X n _ T .+i 
and ending at X n . We refer to the above as a scale Tj average. The variable Tj 
is sometimes called a dyadic scale since its values are restricted to be powers 
of two. It is a dimensionless scale that is associated with a physical scale of 
Tj A. Since T\ = 1 and hence A\ n = X n , we can think of the original series 
as being unit scale ‘averages’. 

The definition for Aj >n makes sense as long as Tj — 1 < n < IV— 1; however, 
Aj >n is ill-defined when Tj > 2 and 0 < n < Tj — 2 because (1) would then 
involve X_i and possibly other values of the time series we don’t have access 
to. To force A jn to be well defined for the full range 0 < n < N — 1, we 
assume that the time series is periodic with a period of IV; i.e., X n = X n+ N 
for all integers n. With this definition, X_i = Wv_i, JV _2 = Xjv -2 and so 
forth. This assumption introduces some ‘boundary’ averages such as = 
(Xo + Xm- i)/2, which combine nonadjacent values from the original series 
when N > 2. For r 2 = 2, the only possible boundary average is H 2 j o, while 
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the other iV — 1 averages A 2 i,..., A 2> jv-i involve adjacent values from the 
time series. 

If we let a,jj = 1 /Tj for 0 < l < Tj — 1, we can reexpress (1) in filtering 
notation as 

Tj - 1 

Aj n — ^ ^ ^ — Oj 1) ■ ■ • j Af 1 . 

1=0 

The left-hand portion of Fig. 1 shows the filters ay; for the dyadic scales 
indexed by j = 1, 2, 3 and 4 (we define a.jj to be zero when l < 0 or l > Tj). 




0 5 10 15 0 5 10 15 


Fig. 1: Averaging filters a,jj ( left-hand panel ) for dyadic scales Tj = 2 J_1 , j = 
1,2,3 and 4, and related differencing filters dj } i [right). The averaging hlters are 
proportional to Haar scaling filters, and the differencing filters, to Haar wavelet 
filters. 


While averages of time series over various scales are of interest in their own 
right, what is often of more interest is how these averages change over time. 
For example, a key question about various indicators of climate is whether 
their average values over certain time scales have changed significantly with 
time. The wavelet transform is a mechanism that allows us to quantify how 
averages of a time series over particular scales change from one interval of 
time to the next. These changes are quantified in wavelet coefficients, which 
form the bulk of any DWT. 

Wavelet coefficients in a DWT are organized into sets. There is one set 
for each dyadic scale Tj, and each coefficient in this set is proportional to 
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the difference between two adjacent non over lapping averages. Mathematically, 
these differences are given by 


2rj —1 

.n—Tj ^ ' djiX n _i , 77. 0,1,..., iV 1 , (2) 

7=0 

where 

{ 1 /tj, Z = 0,... ,Tj — 1 ; 

— 1/tj , l = Tj ,..., 2Tj 1 ; 

0 , otherwise. 

The right-hand portion of Fig. 1 shows the differencing filters d h i associated 
with the averaging filters ajj. If Dj n is close to zero, then Aj^ n - T . and Aj <n 
are close to each other, indicating that there is not much change in these 
adjacent nonoverlapping averages of scale Tj ; on the other hand, if Dj^ n has a 
large magnitude, then the two scale Tj averages differ considerably. 

We can now define the Haar maximal overlap discrete wavelet transform 
(MODWT) of maximum level Jo, where Jo is a positive integer that we are free 
to select. This transform consists of Jo + 1 sets of N coefficients, for a total of 
(Jo + 1) x N coefficients in all. There are Jo sets of wavelet coefficients, and the 
remaining set consists of the so-calledjscaling coefficients. For j = 1,..., Jo, 
the wavelet coefficients are given by Wj iU = Dj n / 2 , while the single set of 
scaling coefficients is given by Vj 0tU = Aj 0+ i iU , where n = 0,1, ..., N — 1 in 
both cases. Let X be an N dimensional column vector containing the time 
series X n , and let W ; be a similar vector containing the level j MODWT 
wavelet coefficients Wj n . We can then write 

w j = W :I X , (3) 


where Wj is an N x iV matrix whose rows can be deduced by studying (2). 
For example, if N = 7 and j = 2 so that 72 = 2, we find that 


W 2 


1/4 0 0 0 —1/4 —1/4 1/4 

1/4 1/4 0 0 0 -1/4 -1/4 

-1/4 1/4 1/4 0 0 0 -1/4 

-1/4 -1/4 1/4 1/4 0 0 0 

0 -1/4-1/4 1/4 1/4 0 0 

0 0 -1/4-1/4 1/4 1/4 0 

0 0 0 -1/4 -1/4 1/4 1/4 


Note that any of the bottom six rows in W 2 can be obtained by circularly 
shifting the row above it to the right by one, a pattern that holds for all 
Wj. Note also that the first three rows yield boundary wavelet coefficients 
since they combine together values of the time series that are not contiguous 
in time (in general, there are min{2rj — 1,1V} boundary coefficients). In a 
similar manner, if Vj 0 is an N dimensional column vector containing the 
scaling coefficients Vj 0j „, then we can write 
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Vj 0 = V Jo X , (4) 

where Vj 0 is an N x N matrix whose rows are dictated by (1). 

In practice the MODWT wavelet and scaling coefficients are not com¬ 
puted directly via (3) and (4), but rather via an efficient recursive procedure 
known as the pyramid algorithm (for pseudo-code describing this algorithm, 
see pp. 177 178 of [7]). 


3 Analysis of Variance via the Wavelet Variance 

The MODWT leads to two basic decompositions for a time series X n . The 
first is an analysis of variance (ANOVA) that is based on a decomposition 
of the ‘energy’ in X n (the second is discussed in Sect. 4). By definition the 
energy in a time series is just the sum of its squared values: 


N-l 

]rx2 = x T x = iixii 2 , 

n=0 

where ^ T , denotes the transpose operation, and |jX|| is the Euclidian norm of 
X. This decomposition states that 

l!x|| 2 = Ellwj 2 + l|V/J 2 , (5) 

3 =1 

so the energy in the series is preserved in its MODWT wavelet and scaling 
coefficients. 

Let <J 2 X be the sample variance for the time series: 

iV-l JV-l N-l 

( Xn “*) 2 =NT. X n-* 2 ’ where X = - £ X n . 

7i—0 n— 0 n— 0 

It follows from (5) that 



i=i 


In the above, we refer to ||Wjj| 2 /V = v 2 as the empirical wavelet variance. 
We can regard v 2 as an appropriate definition for the sample variance of 

the level j wavelet coefficients. This assumes that the mean value of Wj 
can be taken to be zero, which is reasonable for certain X n because of the 
differencing operation inherent in the filters used in (2). On the other hand, 
fI|Vj 0 || 2 - V is the sample variance of the scaling coefficients because Vj 0 
is a running average of X and hence has a sample mean of X. Equation ( 6 ) 
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thus gives us a scale-based ANOVA, in that we are breaking a\ up into J 0 +1 
pieces, each of which can be interpreted in terms of sample variances of either 
differences in averages over the dyadic scales n ,..., tj 0 or averages over a scale 
of 2 tj 0 = tj 0+ i . If N = 2 J for some positive integer J and if we set Jo = J, 
the contribution to a\ due to the scaling coefficients drops out because Vj 0 
becomes a vector whose elements are all equal to X, and we then have 

^ ii^ii 2 = E^- (?) 

3 = 1 3 = 1 

Even if these stipulations on N and Jo are dropped, the above is still a good 
approximation as along as Jq is large enough so that tj 0+ i is close to N. 



Fig. 2: Two time series ( left-hand plots), each with N = 16 values. Both series have 
the same sample means and variances. The right-hand plots show the corresponding 
the Haar MODWT wavelet variances over the dyadic scales 1, 2, 4 and 8. 


To see how the wavelet variance can help characterize time series, consider 
the two artificial series shown in the left-hand column of plots in Fig. 2. By 
construction both series have exactly the same sample mean and variance, 
but their appearances are quite different. Series (a) varies more slowly than 
series (b), which tends to fluctuate back and forth from one time point to the 
next. The right-hand plots show the corresponding empirical wavelet vari¬ 
ances versus the dyadic scales t\,.. ., 74 . The wavelet variances for the two 
series have their largest values at different scales, namely, scale 73 = 4 for (a) 
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and T\ = 1 for (b). Small-scale fluctuations are thus an important part of the 
overall variability of series (b), but less so for (a), where larger scale fluctua¬ 
tions are more prominent. Although the sample mean and variance are here 
incapable of distinguishing between the two series, the scale-based ANOVA 
given by the wavelet variance can in a manner that is intuitively reasonable. 

The next two subsections consider ‘real world’ examples, both involving 
Arctic sea ice. Other examples of the use of the wavelet variance in geophysics 
include the study of the El Nino-Southern Oscillation [8], surface albedo 
and temperature in desert grasslands [9], soil variations [10], the relation¬ 
ship between rainfall and runoff [11], ocean surface waves [12], solar coronal 
activity [13], North Atlantic sea levels [14], atmospheric turbulence [15] and 
the impact of large multi-purpose dams on water temperature variability [16]. 


3.1 Wavelet Variance Analysis of Arctic Ice Types 

Naval submarines with upward-looking sonars have collected data on sea-ice 
thickness in the Arctic Ocean since 1958. Currently data from 34 cruises con¬ 
ducted by the U.S. Navy between 1975 and 2000 have been publically archived. 
These data provide a unique direct look at the climatology of Arctic ice thick¬ 
ness as a function of space and time. The upper plot of Fig. 3 shows a 0.75 km 
portion of one such series of ice thickness measurements X n (in meters) taken 
near the North Pole in April of 1991 (the entire set of measurements extends 
over 50 km). We can regard X n as a time series with A = 0.001 km, where 
here ‘time’ is considered as a surrogate for distance along the submarine track 
under the ice (the observations were recorded at regular intervals of time, but 
the submarine was moving at a constant speed). 

Ice thickness can be classified into four types, which are driven by different 
physical processes [17, 18]. The first type consists of leads and new ice and 
has a thickness below 0.3 m; the second is first year ice and ranges from 0.3 to 
2 m; the third is medium multiyear ice, from 2 to 5 m; and the fourth is ridged 
ice, anything above 5 m. The divisions between the four types are marked on 
Fig. 3 by horizontal dashed lines. Let Xn 1 be a binary-valued series indicating 
the absence or presence (using 0 or 1) of ice type i at measurement X n . These 
four indicator series are plotted in the bottom of Fig. 3. 

Figure 4 shows empirical Haar wavelet variances for the four indicator se¬ 
ries Xn ■* plotted versus t 3 A for j ranging from 1 to 14 (i.e., physical scales 
from 0.001 up to 8.192 km). If we regard as an estimate of a hypothe¬ 
sized theoretical wavelet variance, we can determine how far our estimates 
are likely to be off from the true wavelet variances (for details, see Chap. 8 
of [7]). The vertical lines in Fig. 4 indicate 95% confidence intervals (CIs) for 
the true wavelet variances for ice type 1 (the three other ices types would 
have CIs with similar widths). Note that the widths of the CIs increase as Tj 
increases. 
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Fig. 3: Portion of a series of Arctic ice thickness measurements X n versus distance 
along a submarine track (upper plot), along with four binary-valued series (lower) 
indicating the absence/presence (0/1) of four ice types: leads and new ice (defined 
as X n < 0.3 m and denoted as X^), first year ice (0.3 < X n < 2, X^), medium 
multiyear ice (2 < X n < 5, Xn' 1 ) and ridged ice (X n > 5, Xi 4 ^). The sampling 
interval is A = 0.001 km. The horizontal dashed lines in the upper plot depict 
the defining boundaries for the ice types. These data were taken near the North 
Pole in April of 1991 and are archived at the National Snow and Ice Data Center 
(http: //nsidc . org/). 

All four wavelet variance curves in Fig. 4 have a single broad peak. The 
largest Dj for each ice type is marked with a solid diamond. While the scale at 
which the largest value occurs is similar for types 2, 3 and 4 (either 16 or 32 m), 
the one for type 1 is an order of magnitude larger (256 m). We can consider 
the location of these peak values as defining a characteristic scale for each 
ice type. A question of geophysical interest is how stable these characteristic 
scales are both spatially and temporally. This question can be addressed by 
using Dj to determine these scales from data taken at other locations and 
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scale (km) 


Fig. 4: Empirical Haar wavelet variances Uj versus physical scales Tj A, j = 
1,2,..., 14, for the four binary-valued ice type series Xrf 1 shown in Fig. 3. The 
vertical lines emanating from each for A'i 1 * represent 95% confidence intervals 
for a hypothesized theoretical wavelet variance. The largest wavelet variances for 
each ice type are indicated by a solid diamond their locations define wavelet-based 
characteristic scales. 

times across the Arctic basin. For this application, the wavelet variance thus 
extracts a summary statistic that picks out the largest scale-based contributor 
to the sample variance of an ice-type indicator series, and this statistic can be 
studied across space and time to deduce possible changes in the climatology 
of Arctic ice thickness. 

3.2 Wavelet Variance Analysis of Averaged Ice Thickness 

As a second example, let us consider another series of ice thickness mea¬ 
surements, but now consisting of one kilometer averages that have been de¬ 
trended by subtracting off a line fit via least squares (the sampling interval is 
A = 1 km). The residuals from this fit are plotted in Fig. 5. 

The empirical wavelet variances /A for j = 1,..., 9 for the residual thick¬ 
nesses are shown in Fig. 6, along with 95% confidence intervals for a hypothe¬ 
sized theoretical wavelet variance (the vertical lines). A linear least squares fit 
of log 10 (i>|) versus log 10 (rj A) is also shown (the line with a slope of —0.53). 
With 1 km averaging, the largest wavelet variance occurs at the smallest 
scale, but what is of more interest is the rate of decay of v* with increas¬ 
ing scale. This decay is very close to linear on a log/log scale. This form of 
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Fig. 5: Arctic ice thickness residuals X n versus distance along a submarine track. 
The residuals are the deviations from a least squares fit of a line to a series of 1 km 
average thicknesses (the sampling interval A is also 1 km). There are N = 803 
thickness measurements, and these were collected from a Scientific ICe Expedition 
(SCICEX) cruise within the Arctic Ocean in September of 1997 and are archived at 
the National Snow and Ice Data Center (http://nsidc.org/). 



Fig. 6: Empirical Haar wavelet variances Uj versus physical scales Tj A, j = 
1, 2,..., 9, for the residual thickness series shown in Fig. 5. The vertical lines ema¬ 
nating from each v | represent 95% confidence intervals for a hypothesized theoretical 
wavelet variance. The line through the variances is a least squares fit of log 10 (i>|) 
versus log 10 (rj A) and has a slope of —0.53. 
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decay is indicative of a stationary process whose spectral density function 
(SDF) S(f) is approximately proportional to a power law |/|“, where / is 
a Fourier frequency. For such a process, it can be argued that the theoreti¬ 
cal wavelet variance should be approximately proportional to T~ a ~ , which 
implies that a log/log plot of uj versus t 3 A should be approximately linear, 
with a slope given by —a — 1. The observed slope of —0.53 thus maps into a 
power-law exponent of a = —0.47. A process whose SDF is proportional to 
|/| -0 ' 47 exhibits long-range dependence, which is characterized by an autoco¬ 
variance function that decays at a slower rate than standard models such as 
autoregressive and/or moving average processes. This slower rate of decay has 
implications in assessing the sampling variability in various statistics derived 
from ice thickness measurements (for details, see [19, 20]). 

4 Multiresolution Analysis 

We now turn to the second basic decomposition afforded by the MODWT, 
which is an additive decomposition known in the wavelet literature as a mul¬ 
tiresolution analysis (MRA). This decomposition says that we can reexpress 
X as the sum of Jo + 1 new time series, each of which has a scale-based 
interpretation. In particular, define 


D, = Wj Wj and S Jo = Vj 0 V Jo , 


( 8 ) 


where Dj and S j 0 are N dimensional vectors known as, respectively, the jth 
level detail and the Jpth level smooth. We can now write 



(9) 


where Dj is a time series reflecting variations in averages over a scale of 7j in 
X, whereas Sj 0 is a series reflecting averages over a scale of tj 0+ \. Note that 
we can recover our original time series X from its MODWT, which tells us 
that no information about the series has been lost in transforming it and that 

(8) constitutes the pieces of an inverse MODWT. Thus, if we know how a time 

series varies at the dyadic scales T\ ,..., tj q and if we know its averages over 
a scale of then we can reconstruct the series perfectly. If we compare 

(9) to a level Jq + 1 decomposition, namely, 


•/o+l 


X — ^ Dj + Sj 0 + i, 


we can deduce that, for all j, 


— ®j+i + Dj_|_i , 


(10) 


72 


D.B. Percival 


and hence the details can be interpreted as the differences between successive 
smooths. If N = 2 J and if we again set Jq = J, then (9) becomes 

Jo ~ 

X = ^D i + Zl, (11) 

3 =1 

where 1 is an N dimensional vector, all of whose elements are ones. 

As a simple example of an MRA, let us consider a series X of IV = 352 
oxygen isotope measurements from an ice core taken at one location on a spa¬ 
tial array with 3.5 to 7 km spacing in Dronning Maud Land, Antarctica. Here 
the spacing between observations is taken to be A = 0.5 years (the raw mea¬ 
surements are indexed by distance along the core, but these are then mapped 
to values at half-year intervals). The series is plotted in the upper panel of 




Fig. 7: Oxygen isotope measurements from an Antarctic ice core (top panel), along 
with a Haar MODWT-based multiresolution analysis of level Jo = 1 consisting of 
the smooth series Si and a single detail series Di (bottom panel, upper and lower 
plots, respectively). Due to the large difference between the beginning and end of the 
series, the MODWT was computed using so-called reflection boundary conditions 
rather than the periodic conditions described in Sect. 2 (for details, see p. 140 of 
[7]). Data is courtesy of Lars Karlof, Norwegian Polar Institute, Polar Environmental 
Centre, Tromsp, Norway. 
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Fig. 7 and has a temporal span of 176 years. For each of the cores in the array, 
an MRA was conducted in order to compare details with similar scales to as¬ 
certain which scales are dominated by environmental noise and which might 
contain a common signal (see [21] for details). Here we demonstrate that the 
simplest possible MRA for the core shown in Fig. 7 reveals some interesting 
properties not readily apparent in a plot of the data itself. 

The lower panel of Fig. 7 depicts a level Jo = 1 Haar MODWT-based 
MRA, consisting of a smooth series Si and a single detail series Di, which, 
upon being added together, yield X. The smooth series is the portion of X 
that can be attributed to averages over a scale of a year, whereas the detail 
series represents variations over a half-year scale. What is interesting is that 
the local variability in D x increases gradually with the passage of time. This 
increase is not readily apparent in the plot of X itself, but the MRA pulls it out 
clearly. The physical mechanism behind this increase is not fully understood, 
but is thought to be due to diffusion. 

In addition the MRA reveals an artifact in Di centered at 1981, around 
which the detail series is flat for a stretch of 5.5 years. This is due to a linear 
interpolation scheme used to fill in a break in the ice core. Differencing a series 
of the form X n = a +bn leads to wavelet coefficients that are proportional to 
the slope b and a detail series that is flat. For the questions that this MRA 
and those for other cores in the spatial array were used to address, filling 
in a small number of short gaps by linear interpolation is acceptable. Had 
we been interested in estimating the wavelet variance for a series with many 
gaps, linear interpolation could bias the estimates unacceptably towards zero. 
In this case it is advisable to use either a wavelet variance estimator that 
is specifically designed to work with gappy time series [22] or a stochastic 
interpolation scheme that preserves the small scale properties of the time 
series based upon a nominal stochastic model (see [19], Appendix B). 

Other examples of the use of MRAs in geophysics include the analysis of 
subtidal sea level fluctuations [23], magnetic storm activity [24], the Lisbon 
and Gibraltar North Atlantic Oscillation winter indices [25], spatial variation 
of microflora abundance in agricultural soil [26], the December 26th 2004 
tsunami as recorded along the southeastern coast of Brazil [27] and large- 
scale coherent structures in turbulent separation bubbles [28]. 


5 Orthonormal Discrete Wavelet Transform 

While a level Jo MODWT of a time series of length N consists of a total of 
(Jo + 1) x N values, it is also possible to define a discrete wavelet transform 
that consists of just N values. This transform is orthonormal, which means 
that the transpose of TV x TV matrix W relating the time series to the transform 
coefficients is the inverse of W. We hence use the acronym ‘ODWT’ to denote 
this transform. We can readily define the ODWT in terms of the MODWT if 
TV happens to be an integer multiple of 2 J ° (if TV is not of this form, an ODWT 
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can still be defined, but not as easily - see pp. 141-145 of [7] for details). The 
ODWT wavelet coefficients are given by 


W jin = 2 */ 2 W.. 


1 ) — 1 ? 
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" 1 
V 


j 1, 2 ,..., Jq ; 


i.e., the ODWT coefficients are obtained by subsampling and rescaling the 
MODWT coefficients. For example, at level j = 1, the ODWT coefficients are 
formed by taking the MODWT coefficients with odd indices and multiplying 
them by y/2, whereas, at level j = 2, we subsample every fourth coefficient 
and multiply them by 2: 
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As the level j increases, we need to subsample fewer and fewer MODWT 
wavelet coefficients in order to create the corresponding ODWT coefficients 
in W j. In a similar manner the ODWT scaling coefficients are defined by 


Vjo,n 


2 J °/ 2 ^Jo,2 J 0 (n+ 1 )_ 1 > 


n = 0,1 ,..., N Jo - 1 , 


and can be placed in an Nj 0 dimensional vector denoted as V j 0 . The ODWT 
of level Jo consists of the collection of vectors Wi, W 2 , ..., W j 0 and Vj 0 , 
whose dimensions are, respectively, N/2, 1V/4, ..., N/2 J ° and N/2 J °, which 
collectively sum to N. 

As was true for the MODWT, the ODWT leads to a scale-based ANOVA 
and an MR A. We start by considering the analogs of (3) and (4): 


W, = WjX and V Jo = V Jo X , (12) 

where W 7 is an A x N matrix whose rows are selected rescaled rows from Wj, 
while Vj 0 is an x N matrix whose rows are selected rescaled rows from Vj 0 . 
The ODWT-based ANOVAs and MRAs are easy to state: just remove all the 
tildes from Eqs. (5) through (11)! In practice the ODWT wavelet and scaling 
coefficients are not computed by subsampling the corresponding MODWT 
coefficients, but rather via an efficient pyramid algorithm (pseudo-code for 
this algorithm is given on pp. 100-101 of [7]). 

In general, MODWT-based ANOVAs and MRAs tend to perform better 
than their ODWT equivalents because of the deleterious effect that subsam¬ 
pling can have on the ODWT (for details, see Sects. 5.1, 5.6 and 8.3 of [7]); 
however, the ODWT is the transform of choice for certain other types of anal¬ 
yses. For example, if a time series can be modeled as a signal plus Gaussian 
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white noise, then its ODWT consists of a transformed signal plus Gaussian 
white noise. Certain types of signals are more easily recognized in the ODWT 
domain than in their original time domain representation, which makes it pos¬ 
sible to design effective data-adaptive procedures for extracting signals buried 
in white noise. This fact is exploited in the large body of literature devoted 
to wavelet shrinkage; see [29] for a recent review article that emphasizes this 
use of the ODWT. (There is a device called ‘cycle spinning’ in which ODWT- 
based signal extraction is applied to a time series and all its possible circular 
shifts, followed by an averaging of the N extracted signals. This procedure 
is equivalent to a signal estimation procedure based upon the MODWT; for 
details, see pp. 429-431 of [7]). 

As a second example, the ODWT transforms certain - but not all - time 
series into a collection of wavelet coefficients that are approximately uncor¬ 
related within and between levels, but that have possibly level-dependent 
variances. Time series with long-range dependence are examples of ones that 
are effectively decorrelated by the ODWT. This decorrelating property can 
be put to good use in formulating wavelet-based approximate maximum like¬ 
lihood estimators of parameters associated with processes with long-range 
dependence, in simulating series with long-range dependence and in formu¬ 
lating bootstrap procedures for assessing the sampling variability in certain 
statistics (for details, see [7, 30]). 


6 Beyond the Haar Wavelet 

Our discussion so far has focused on the Haar MODWT and corresponding 
ODWT, but there are other versions of both transforms. For a selected max¬ 
imum level Jo, these transforms can be formulated in terms of wavelet filters 
of levels j = 1,..., J 0 and a scaling filter of level J 0 . Figure 8 shows the 
level j = 3 wavelet filters for the Haar transform and LA( 8 ) transform, where 
‘LA( 8 )’ stands for the member of the Daubechies ‘least asymmetric’ family 
whose level j = 1 filter has width L = 8 [31]. The shape of the Haar filter 
tells us that the corresponding wavelet coefficients are proportional to differ¬ 
ences of adjacent simple averages of scale 4. The shape of the LA( 8 ) filter 
says that the wavelet coefficients can be interpreted as the difference between 
a centrally located weighted average and weighted averages occurring before 
and after it. Once the wavelet and scaling filters have been used to properly 
formulate the matrices Wj, Vj 0 , W 7 , and Vj 0 of (3), (4) and (12), all of the 
equations involving the Haar MODWT and ODWT presented in Sects. 2-5 
also hold for the corresponding LA( 8 ) transforms. 

The LA( 8 ) transform can yield a more informative ANOVA and MRA 
than the Haar for certain time series because the latter can suffer from ‘leak¬ 
age’ effects in which the wavelet coefficients for a particular scale are locked 
into patterns driven by a nearby dominant scale. The fact that the wavelet 
coefficients are highly correlated between different levels is undesirable be- 
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Fig. 8 : Filters used to produce scale 13 wavelet coefficients based upon the Haar 
wavelet filter (top) and the Daubechies least asymmetric wavelet filter of width 8 
(bottom). 

cause the transform is then not successfully partitioning out different aspects 
of a time series into different coefficients. In many geophysical applications, 
including the ones used as examples in Sects. 3 and 4, an analysis based upon 
the Haar wavelet is entirely adequate, and there is no need to consider other 
wavelets. An effective procedure for deciding if the Haar wavelet is adequate 
or not is to compare analyses based upon the Haar wavelet with those based 
upon other wavelets. If the analyses are basically the same, there is no need 
to use anything other than the Haar; if not, an analysis using something other 
than the Haar wavelet might be called for. Use of non-Haar MODWTs and 
ODWTs produces more boundary coefficients, so there is a price to pay in 
abandoning a Haar-based analysis. 


7 Concluding Comments 

Hopefully the overview presented here has given the reader some idea of the 
potential uses of DWTs in analyzing geophysical time series. There are many 
aspects of wavelet analysis that we have not touched upon, including the fact 
that all of the procedures we have discussed can be applied to time series 
whose statistical properties are evolving over time. The ability of DWTs to 
handle this case, which is eluded to briefly in the MRA for the oxygen series 
presented in Fig. 7, is tied up with the fact that the wavelet coefficients extract 
information not only across different scales, but also across time. For exam¬ 
ple, a wavelet variance estimator in which the squared wavelet coefficients 
are averaged locally rather than globally (as in the construction of z>|) is an 
effective way of studying time-varying properties in a time series. The reader 
should consult [7] for details on this and other aspects of wavelet analysis not 
covered in this brief overview. 

In Sect. 5 we discussed some of the relative strengths and weaknesses of the 
MODWT and the ODWT. These DWTs are closely related to corresponding 
continuous wavelet transforms (CWTs), which also are quite commonly used 
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to analyze geophysical time series. A CWT might be called an ‘anti-statistic’ 
in the sense that, rather than summarizing the information in a time series, 
it converts it into a two-dimensional field. As a result, there is a considerable 
amount of redundant information in a CWT, which is both a strength and a 
weakness. One example where this redundancy is a strength is in the analysis 
of certain types of singularities (‘cusps’), where the nature of the singularity 
can be deduced by tracing the wavelet transform modulus maxima across a 
fine grid of scales (see, e.g., [32], Fig. 6.5). The dyadic scales used in DWTs are 
typically too coarsely spaced to make this type of singularity analysis feasible. 

The redundancy in the CWT, however, can make proper interpretation 
of ‘heat’ plots of the CWT problematic, i.e., scale versus time plots in which 
the magnitudes of the CWT coefficients are color-coded. These plots often 
have rather striking structures that our eyes are drawn toward, but that can 
be largely attributed to the fact that CWT coefficients are typically highly 
correlated both spatially and temporarily. Proper statistical assessment of the 
significance of these structures involves some subtle issues [33], particularly 
if they are picked out by eye prior to being assessed. Subsampling to the 
dyadic scales in the MODWT and ODWT essentially breaks this correlation 
structure spatially, and subsampling the MODWT to get the ODWT does the 
same temporarily. The fact that collections of coefficients from these DWTs 
are approximately uncorrelated makes it easier to devise statistical tests and to 
implement bootstrapping procedures (the latter are not feasible with CWTs). 

Finally we note that the CWT does not formally involve components in a 
time series that are handled in DWTs by the scaling coefficients. These are 
often useful for extracting large-scale trends that are an important part of 
some geophysical time series and that are a key component in wavelet-based 
signal extraction. 
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Abstract. In numerical forecasting, unknown model parameters have been esti¬ 
mated from a time series of observations by regarding them as extra state variables, 
and applying standard data assimilation methods that use ensembles to represent 
background error. In many situations, however, the use of ensembles is prohibitively 
expensive and/or impracticable because of the inability to properly account for 
model error in the initialization scheme. If one is seeking to estimate model pa¬ 
rameters as data is assimilated, it is possible to take advantage of the assumed 
relative constancy of such parameters over large regions of time and space to derive 
an estimate from a single realization. The approach follows from a general result on 
synchronously coupled dynamical systems, where one system here represents “truth” 
and the other “model”: If two such systems can be made to synchronize when their 
corresponding parameters are identical, for any coupling scheme (such as might be 
used in conventional data assimilation) a parameter estimation law can generally 
be added that will dynamically reduce a total cost (Lyapunov) function including 
parameter mismatch terms as well as state mismatch terms. 

The approach is used to estimate a parameter that quantifies the effect of soil 
moisture in a single-column version of the Weather Research and Forecasting (WRF) 
model. The scheme can be extended to infer a 2D map of soil parameter values for 
a 3D model, using the fact that the parameter is slowly varying almost everywhere. 
Discontinuities are represented as additional degrees of freedom, and the Lyapunov 
function is augmented so as to penalize for horizontal variations in the soil param¬ 
eter value except at locations of such discontinuities. The constrained optimization 
approach that is proposed should be useful for a variety of parameter estimation 
problems in numerical weather prediction (NWP), and will extend the power of 
ensemble methods. 


Keywords: Parameter estimation, Data assimilation, Chaos synchronization, 
Mesoscale forecasting 
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1 Introduction 

Any scheme for meteorological data assimilation has the goal of synchronizing 
a computational model with the real climate system, based on a limited set 
of noisy observations. Where the purpose is prediction of the state of the 
real system in the not-too-distant future, this synchronization view contrasts 
a priori with the usual view, in which observations are combined with the 
current model state to form the best possible estimate of the current state of 
the real system. 

In this chapter, we extend previous work on synchronization-based data 
assimilation, summarized in the next two sections, to show that parameters 
can be readily synchronized, as well as states. The synchronization view lends 
itself particularly well to the estimation of slowly varying parameters, a point 
made with a simplified, single-column version of an actual weather prediction 
model in Sect. 4. It is argued that the synchronization view also lends itself 
to the estimation of local parameter values that are slowly varying in space 
almost everywhere. In the concluding section, it is argued that the approach to 
parameter estimation can be extended to a more general scheme for machine 
learning. 

2 Background: Data Assimilation 
and Synchronized Chaos 

The phenomenon of chaos synchronization [1] was first brought to light by 
Fujisaka and Yamada [2] and independently by Afraimovich et al. [3], but 
extensive research on the subject in the ’90s was spurred by the seminal work 
of Pecora and Carroll [4], who considered two chaotic systems in a master-slave 
relationship defined by a shared subsystem. Pecora and Carroll considered 
configurations such as the following combination of Lorenz systems: 


X = cr(Y- X) 

Y = pX -Y - XZ 
Z = -PZ + XY 


(1) 


Y 1 = pX - Y 1 - XZ ± 
Z, = -pZ! + XY L 


which synchronizes rapidly, slaving the Y\ , Zj-subsystem to the master X , Y, Z- 
subsystem, as seen in Fig. 1, despites differing initial conditions and despite 
sensitive dependence on initial conditions. 

If we imagine that the first Lorenz system represents the world, and that 
the second Lorenz system is a predictive model, then synchronization effects 
data assimilation of observed variables into the running model. The only ob¬ 
served variable in the foregoing example is X , but that is sufficient to cause 
the desired convergence of model to truth. Synchronization is known to be 
tolerant of reasonable levels of noise, as might arise in the observation chan¬ 
nel, and occurs with partial coupling schemes that do not completely replace 
a model variable with a variable of the observed system. 
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Fig. 1: The trajectories of the synchronously coupled Lorenz systems in the Pecora- 
Carroll complete replacement scheme (1) rapidly converge (a). Differences between 
corresponding variables approach zero (b). 


Specifically, systems can also synchronize when coupled diffusively , as with 
a pair of directionally coupled Rossler systems: 

X = -Y - Z + a{X 1 -X) X-! = -Yi - Z x + a(X - Xf) 

Y = X + aY Y 1 = X 1 + aYi (2) 

Z = b+ Z{X - c) Z 1 = b + Zi(Xi - c) 

where a parametrizes the coupling strength. The diffusive coupling scheme 
can be seen to resemble the “nudging” approach to data assimilation [5]. For 
judicious choice of nudging coefficient, it can be seen to resemble 3DVar, and 
for time-varying coefficient, Kalman filtering (as defined in e.g. [6]). 

It is commonly not the existence, but the stability of the synchronization 
manifold that distinguishes coupled systems exhibiting synchronization from 
those that do not (such as (2) for different values of a). N Lyapunov exponents 
can be defined for perturbations in the iV-dimensional space that is transverse 
to the synchronization manifold M. If the largest of these, is negative, 

then motion in the synchronization manifold is stable against transverse per¬ 
turbations. In that case, the coupled systems will synchronize for some range 
of differing initial conditions. However, since h^ ax only determines local sta¬ 
bility properties, the size of the basin of attraction for the synchronized regime 
remains unknown. As h^ ax is increased through zero, the system undergoes 
a blowout bifurcation. For small positive values of on-off synchroniza¬ 

tion occurs (a special case of on-off intermittency), as illustrated in Fig. 2b, 
where degradation results from a time lag in the coupling. The other panels 
of Fig. 2 show an increasing rate of bursting as the time-lag increases. Ves¬ 
tiges of synchronization are discernible even far from the blowout bifurcation 
point (Fig. 2c), a phenomenon that was used to predict new teleconnection 
patterns [7]. 
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Fig. 2: The difference between the simultaneous states of two Lorenz systems with 
time-lagged coupling, represented by Z(t) — Z\{t) vs. t for various values of the in¬ 
verse time-lag F illustrating complete synchronization (a), intermittent or “on-off” 
synchronization (b), partial synchronization (c), and de-coupled systems (d). Aver¬ 
age euclidean distance (D) between the states of the two systems in X , Y. Z-space 
is also shown. The trajectories are generated by adaptive Runge-Kutta numerical 
integrations with a = 10, p = 28, and /3 = 8/3. 


The early work on synchronized chaos was spurred by an intended appli¬ 
cation to secure communications, since the signal connecting the two synchro¬ 
nized systems can be difficult to distinguish from noise. Practical applications 
to cryptography have not emerged, largely because system parameters can 
be extracted from the coupling signal, with some effort, for low-dimensional 
systems. But the essence of the phenomenon is that two systems that are 
effectively unpredictable, connected by a signal that may be almost indeci¬ 
pherable, can still exhibit significant correlations. It is argued that this phe¬ 
nomenon makes weather prediction possible, and will be more generally useful 
for real-time computational modeling of natural systems. 
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3 Background: Synchronization in Geophysical 
Fluid Dynamics 

Synchronization in geophysical fluid models was demonstrated by Duane and 
Tribbia [8], originally with a view toward predicting and explaining new fam¬ 
ilies of long-range teleconnections [9]. The uncoupled single-system model in 
this work was derived from one described by Vautard et al. [10]. 

The model is given by the quasigeostrophic equation for potential vorticity 
q in a two-layer reentrant channel on a /3-plane: 

= Fi + A (3) 

where the layer i = 1,2, ^ is streamfunction, and the Jacobian J(ip, •) = 
7kF§y ~ 7hi§x gi yes the advective contribution to the Lagrangian derivative 
D/Dt. Equation (3) states that potential vorticity is conserved on a moving 
parcel, except for forcing Fj and dissipation Di. The discretized potential 
vorticity is 

qi = f 0 + /3y + V 2 -0i + Rr 2 Wh - V^X-l)' (4) 

where f[x,y) is the vorticity due to the Earth’s rotation at each point (x,y), 
fo is the average / in the channel, (3 is the constant df/dy and Ri is the 
Rossby radius of deformation in each layer. The forcing F is a relaxation 
term designed to induce a jet-like flow near the beginning of the channel: 
Fi = fo{q* — Qi) for q* corresponding to the choice of ip* shown in Fig. 3a. 
The dissipation terms Di, boundary conditions, and other parameter values 
are given in Ref. [9]. 

Two models of the form (3), Dq A /Dt = F A +D A and Dq B /Dt = F B +D B 
were coupled diffusively in one direction by modifying one of the forcing terms: 

F k = f B K(<?k - q B ) + &k(9k - 9k )] ( 5 ) 

where the flow has been decomposed spectrally and the subscript k on each 
quantity indicates the wave number k spectral component. (The layer index i 
has been suppressed.) The two sets of coefficients Ok and b k were chosen to cou¬ 
ple the two channels in some medium range of wavenumbers and to force each 
channel only with the low wavenumber components of the background flow: 

( 0 if |fc x | < k x0 and \k v \ < k y0 

(fc n /|k |) 4 if |k| > k n 
1 — (fco/|k |) 4 otherwise 

_ J 1 - a k if |k| < k n 
k 10 if |k| > k n 

as in Ref. [9], where the constants kg, k x o, k y o and k n are defined. 

It was found that the two channels thus coupled rapidly synchronize 
(Fig. 3), starting from initial flow patterns that are arbitarily set equal to 
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Fig. 3: Streamfunction (in units of 1.48 x 10 9 m 2 s -1 ) describing the forcing ip* (a,b), 
and the evolving flow ip (c—f), in a parallel channel model with bidirectional coupling 
of medium scale modes for which Ifc^l > k x o = 3 or | | > k y o = 2, and |fc| < 15, 
for the indicated numbers n of time steps in a numerical integration. Parameters 
are as in Ref. [9]. An average streamfunction for the two vertical layers i = 1,2 is 
shown. Synchronization occurs by the last time shown (e,f), despite differing initial 
conditions. 


the forcing in one channel, and to a different pattern in the other channel. 
(Results are shown for bidirectional coupling defined by adding an equation 
for F£ analogous to (5). The synchronization behavior for coupling in just one 
direction is very similar.) With unidirectional coupling, the synchronization 
effects data assimilation from the A channel into the B channel. 


4 Parameter Adaptation in a Mesoscale Model 

Machine learning might also be realized in the synchronization context, so as 
to correct for deterministic model error in the resolved degrees of freedom. By 
allowing model parameters to vary slowly, generalized synchronization that is 
defined by a complex non-identical correspondence between variables in the 
two models would be transformed to more nearly identical synchronization. 
Indeed, parameter adaptation laws can be added to a synchronously coupled 
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pair of systems so as to synchronize the parameters as well as the states. Parlitz 
[11] showed for example that two unidirectionally coupled Lorenz systems with 
different parameters: 

X = a(Y-X) X 1 = a(Y -X x ) 

Y = pX -Y - XZ Yi = Pl X x - i/Yi - X x Z x (6) 

Z = + 17 Zi = -/3Zi + XiFi 

could be augmented with parameter adaptation rules: 

Pi = (Y- Y x )Xi 

v={Yi- Y)Y 1 (7) 

A = Y - Fi 

so that the Lorenz systems would synchronize, and additionally pi—>/+ i/—>1, 
and P — >0. 

Equations for a synchronously coupled pair of systems can in fact always 
be augmented to allow parameter adaptation as well, provided that relevant 
dynamical variables are observed, as shown by Duane, Yu, and Kocarev [12]. 
Consider, for example, two quasigeostrophic channel models of the form (3) 
coupled according to (5), which are known to synchronize, as discussed in the 
background section. The model parameter to be estimated, f B , is an overall 
coefficient in the forcing term 

F k =/ B ak(gk-rfk) + / B Mgk ^9k) (8) 

where the layer index l is suppressed and the coefficients ak,6k are slightly 
smoothed step functions of k, as before, so that each spectral component is 
either coupled to the corresponding component in the A system or to the 
background flow q* or neither. The coefficients are chosen as before so as to 
couple only the medium-scale components. (The forcing for the A system is 
correspondingly: F = f A ak(q^ — q£)). 

The parameter estimation rule in spectral space: 

f B = ~ )M?k - 9k ) + M9k - 9k )] (9) 

kes 

for a restricted range of wavenumbers in S , as in the figure, causes f B to 
converge to f A , as shown in Fig. 4. 

The derivation of the rule (9) is instructive. One chooses the parameter 
adaptation rule so that a Lyapunov function that quantifies both state error 
and parameter error is monotonically decreasing, using the fact that a Lya¬ 
punov function for the identical-parameter situation is known to be monoton¬ 
ically decreasing, since the identical systems synchronize. The latter, “core” 
Lyapunov function, L a (q A ,q B )\fA^fB = J d 2 x ( q A — q B ) 2 , is known to be 
decreasing at any point ( q A , q B ) in a large region of the coupled-system state 
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channel A 


channel B 



f=0.30000 


n=400 


f= 1.88246 




Fig. 4: The evolving flow ip (a-f) for two quasigeostrophic channel models that are 
synchronously coupled as in [8, 9] (but in one direction only), and with the forcing 
parameter f B for the second channel (denoted /r 0 in the reference) allowed to vary 
according to the truncated parameter adaptation rule (9) with S = {k : k^, k y < 12} 
(in waves per channel-length). Starting from the initial value f B = 3.0 at time step 
n = 0 (not shown), f B converges to the value of the corresponding parameter 
f A = 0.3 in the first channel, as the flows synchronize. (An average of the two layers 
l = 1, 2 is shown.) 


space. Consider the more general Lyapunov function for the case of parameter 
mismatch: 

L{q A , q B J A , f B ) = ( f A - f B ) 2 + J d 2 X ( q A - q B ) 2 (10) 

and compute the time derivative 

L = -2f B (f A + d 2 x ( q A - q B ) 2 . (11) 

The second term on the right-hand side can be expanded using the dynamical 
equation (3), with the forcing term as defined in (8), to compute the extra 
contribution to q B from the time-varying coefficient f B 
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1 d 2 X (q A - q B ) 2 = 2jd*x (q A q B )(q A - q B ) 

= L 0 \ f B =f A + 2(/ a — f B ) 

X f d 2 x ( q A - q B ) [ak(9k - 9k ) + fo k(9k ~ 9k )] e lk ' X 

J k 

= L 0 \f B=f A + 2 {f A - f B ) - 9k ) [«k(9k - 9k ) + M?k - 9k )] 

k 

(12) 

where the sum on the second line, multiplied by f A — f B , is the extra con¬ 
tribution to F in (3). From (11) and (12), it is seen that if we choose the 
adaptation law (9), with S universal, we will have 

L = L 0 \jB—fA < 0 (13) 

as desired. The adaptation law for restricted S can be derived by using a 
different, correspondingly truncated Lyapunov function. 

It is also instructive to consider the effect of a simpler parameter adapta¬ 
tion law. If one ignores the occurence of the parameter f B in the coupling of 
the two channels and only retains the first term in the forcing (8), then the 
parameter adaptation law that guarantees (13) is: 

f B = j d 2 x {q* - q B )(q A - q B ) (14) 

Under the truncated adaptation rule (14) the monotonic convergence of f B to 
the correct value f A (Fig. 5a) is replaced by oscillatory convergence, as plotted 
in Fig. 5b. The robustness of the general approach to parameter estimation is 
apparent in this example. 

The general approach that we have illustrated was formalized by Duane 
et al. [12]. Consider a “real system” given by: 

x = /(x,p), (15) 

P = 0, (16) 

where x £ ]R N , / : M N —> 1R N , and p £ M m is the vector of (unknown, 
constant) parameters of the system. We further assume that s = s(x), where 
s : Hi N -a 1R 11 , n < N 7 is an n dimensional vector representing the experi¬ 
mental measurement output of the system. A “computational model” of the 
system (15) is given by: 


y =/(y,q)+ ^(y,s), (17) 

q = AT(y,x —y), (18) 


where AT(y, 0) = 0, and v is the control signal. Let e = y x and r = 
q — p. Choose a positive definite Lyapunov function L 0 (e)jq = p. Assume that 
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Fig. 5: (a) Convergence of f B to f A = 0.3 (dashed line) for the synchronously 
coupled quasigeostrophic channels, displayed in Fig. 4, and (b) convergence for the 
simplified parameter adaptation rule (14). Monotonic convergence is replaced by 
oscillatory convergence. 


the control signal v is designed such that there is some time to for which 
L 0 (e(t))|q = p < 0 when eft) ^ 0 and L 0 (e(i))|q = p = 0 when e(t) = 0 , for 
all t > to- That is, after time to, the system proceeds monotonically toward 
synchronization. Let h = f(y, r+p) — f(y— e, p). Duane et al. [12] established 
the following theorem: 

Theorem 1. Assume that (i) the control law v in (17) is designed such that 
the synchronization manifold x = y is globally asymptotically stable, (ii) f 
is linear in the parameters p, and (Hi) the parameter estimation law (18) is 
designed such that 



where Sj are positive constants. Then the synchronization manifold y = x, 
p = q is globally asymptotically stable. 

The theorem ensures the stability of the synchronization manifold y = x, 
p = q. It says that if the two systems synchronize for the case of identical 
parameters, then the parameters of the “real system” can be estimated when 
they are not known a priori , provided that each partial derivative dL a /dei 
is known for which the vector dhi/drj (j = 1,....) is not zero. For the usual 
form L a = Yli( e i) 2 i the requirement is that Xi be known if the equation for 
yi contains parameters that one seeks to estimate. By considering a more 
general Lyapunov function that is defined in terms of some subset S of the 
state variables, or their indices, L a = c *(ei ) 2 f° r positive coefficients 

Cj, one obtains the looser requirement for each desired parameter, that Xi be 
known for at least some i for which the iji equation contains that parameter. 
(Convergence may be slower if fewer x.; are known.) 
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As a more realistic example than the quasigeostrophic channel model, the 
Weather Research and Forecasting (WRF) model was considered, as adapted 
for weather prediction over military test ranges for the Army Test and Eval¬ 
uation Command (ATEC). The ATEC application is based on observations 
that are so frequent that they can be assumed to occur at every numerical 
time step, so that a continuously coupled differential equation system can be 
taken to reflect the actual data assimilation scenario. 

At a relevant level of model detail, the prognostic equation for humidity 
(water vapor mixing ratio) q is: 

(I - uf ^ T °- ->)} (19) 

where if is a moisture diffusivity, and M = M(x,y) quantifies the impact of 
soil moisture at each location (x,y), which is a function / of state variables 
such as the zonal wind uq, temperature To, etc. at the surface. To study the 
estimation of M using the synchronization method, attention is restricted to 
a single vertical column (x,y) = (xo,yo) and a model is introduced that is 
diffusively coupled to (“nudged” by) the true state. The model humidity q m , 
for instance, is governed by: 

= §: [ K ^ M/(u m0 ,T m0 ,...)) | + c(q obs - q m ) (20) 

where q obs is the observed humidity (at any level z where an observation is 
taken) that is the sum of the true q and observational noise, c is a coupling 
(“nudging”) coefficient. Similar equations govern the evolution of temperature 
T, wind speed u, and other model variables, but the parameter M is thought 
to enter only the humidity equation (20). 

In accordance with the theorem, extended to PDEs in a straightforward 
way, M for the model was made to vary with observational input as: 

d K 

M ~ --g^-f(umO,T m0 ,...)(q 0 bs(z ) - q m (z)) (21) 

for the case of observations taken at just one level z. For observations at 
multiple levels, (21) is simply averaged over several values of z. 

Repeated convergence of M to its true value, each time followed by a burst 
away from synchronization, is seen in Fig. 6. The behavior differs from the 
smooth convergence in the channel model example because the state vari¬ 
ables do not converge in the time interval under consideration (Fig. 6e) - in 
contrast to the nearly complete synchronization of the two channel models. 
Importantly, in contrast to the example of the channel model, the prognostic 
equation (19) is an idealization of the behavior of the adapted WRF model, 
as implemented in software, the details of which were not completely known 
to the authors. 
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Fig. 6: The variable model parameter M converges to the true value Mt with re¬ 
peated “bursting”, in the dynamical parameter adaptation scheme that only requires 
a single realization M — Mt is plotted for several cases: (a) observations at 7 points 
in the column, but nudging only at surface level, with nudging coefficient c = 0.01 in 
(20), (b) observations and nudging as in (a), but with c = 0.15, (c) observations and 
nudging at 7 points, with c = 0.0025, (d) observations and nudging at 4 points, with 
c = 0.015. Results are unstable, but for all cases the correct value of the parameter 
can be identified. State variables, plotted in (e) for observations and nudging at all 
levels, also do not converge completely over the time interval shown. 
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In a 3-dimensional model, soil moisture availability M(x,y) is a slowly 
varying function of ground position (x, y) almost everywhere. It is not slowly 
varying only for a set of positions of measure zero, at which land cover abruptly 
changes. It is suggested that the Lyapunov function approach can be readily 
extended so as to estimate such a parameter field that is slowly varying almost 
everywhere. 

One simply introduces a Lyapunov function of the form: 



+ B{ 1 — lx,y,x' ,y') 2 (lx,y,x' ,y') 2 ] + C0(l) 


( 22 ) 


and derives dynamical equations such that L is monotonically decreasing. The 
term in (22) with coefficient A tends to force smooth spatial variation in M 
except at locations where the new field l has a value near unity. The variable 
lx,y,x',y‘ is conceptually located at a position between (x,y) and ( x',y '), and 
is intended to represent a linear discontinuity in land cover. N(x,y) denotes 
the local neighborhood of point (x,y). The term with coefficent B tends to 
binarize the values of l, so that either l « 0 or l « 1. The expression <P(l) 
(multiplied by an arbitrary coefficient C) denotes a collection of terms that 
tends to make the discontinuites along which l « 1 one-dimensional in (x,y) — 
space, by inhibiting neighboring parallel “edges”, and favoring neighboring 
contiguous edges. 

While the suggested extension of the single-column approach has not 
yet been tested, it can be expected to succeed on theoretical and empirical 
grounds. The use of multiple neighboring columns to estimate a local soil pa¬ 
rameter promises improvement in principle. The treatment of discontinuities 
resembles methods that have been effectively applied to image segmentation 


[13]- 


5 Concluding Remarks: From Parameter Estimation 
to Model Learning 

The extension of the parameter estimation method to 3D suggests a further 
extension to qualitative model learning. For problems of qualitative model 
optimization, as for the estimation of a 2D parameter field, the requisite Lya¬ 
punov function has multiple local optima, as does (22). The optimization 
problem contrasts with those described by a quadratic Lyapunov function 
that possess a single basin of attraction. Unlike the quadratic case, a stochas¬ 
tic component in the adaptation procedure might play an essential role. The 
stochastic component would allow jumps among the basins of attraction of 
the different local optima defined by the deterministic scheme, as in Fig. 7. 
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Fig. 7: Deterministic parameter estimation rules cause parameters to reach local 
optima of the Lyapunov function L. Stochasticity (e.g. in a simulated annealing 
algorithm) allows jumps among different basins of attraction. 


The resulting approach would resemble that of a genetic algorithm, with a 
“mutation rate” proportional to synchronization error. 

The synchronization approach to data assimilation and model learning 
stands in contrast to the use of ensembles to effectively estimate background 
error [14]. In an anthropomorphic view of machine learning, the synchroniza¬ 
tion approach appears more natural - one learns “on the fly”, rather than 
forming multiple copies of oneself to test alternative possibilities. In the case 
of estimating slowly varying parameters, one is effectively using ergodicity to 
replace an ensemble average by a time average, computed dynamically, as in 
(9) and (21). For the full three-dimensional WRF model, the single-column 
version of which was discussed in the last section, such a replacement is es¬ 
sential, since the dimensionality of mesoscale models precludes the use of a 
large enough ensemble, with currently available computational resources. 
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Abstract. While neural network models have provided nonlinear generalizations of 
classical linear multivariate models (e.g. regression, principal component and canon¬ 
ical correlation analyses), their applications to the analysis and prediction of real 
environmental and climate data are not always successful as many of the datasets are 
very noisy and/or contain relatively few independent observations. We review recent 
efforts directed towards making the nonlinear models more robust - the development 
of (1) an information criterion to alleviate overfitting in nonlinear principal compo¬ 
nent analysis, and (2) a robust version of nonlinear canonical correlation analysis. 
We also discuss two common causes undermining nonlinear models relative to linear 
models: (1) Time-averaging of data (e.g. from daily data to seasonal data) linearizes 
the relation between predictor and predictand due to the central limit theorem. (2) 
When new predictor data lies outside the training range, the nonlinear model may 
extrapolate poorly, thereby decreasing its forecast skills. 


Keywords: Nonlinear principal component analysis, Nonlinear canonical 
correlation analysis, Neural networks 


1 Introduction 

The classical tools for multivariate statistical analysis include linear regres¬ 
sion (LR), classification, principal component analysis (PCA) and canonical 
correlation analysis (CCA). These popular methods suffer from the limitation 
of being linear. Since the late 1980s, neural network (NN) methods have be¬ 
come popular for performing nonlinear regression (NLR) and classification [1], 
More recently, NN methods have been extended to perform nonlinear PCA 
(NLPCA) and nonlinear CCA (NLCCA) [2]. 

In PCA, a given dataset is approximated by a straight line, which mini¬ 
mizes the mean square error (MSE) - pictorially, in a scatterplot of the data, 
the straight line found by PCA points in the dominant direction of the dataset. 
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In NLPCA, the straight line in PCA is replaced by a curve which minimizes 
the MSE. NLPCA can be performed by a variety of methods, e.g. the autoas- 
sociative neural network (NN) model [3, 4], and the kernel PCA model [5]. 
NLPCA belongs to the class of nonlinear dimensionality reduction techniques, 
which also includes principal curves [6], locally linear embedding (LLE) [7] and 
isomap [8]. Self-organizing map (SOM) [9] can also be regarded as a discrete 
verion of NLPCA. 


a 

x x' 



Fig. 1: (a) A schematic diagram of the autoassociative feed-forward multi-layer 
perceptron NN model for performing NLPCA. Between the input layer x on the 
left (the Oth layer) and the output layer x' on the far right (the 4th layer), there are 
3 layers of “hidden” neurons (the 1st, 2nd and 3rd layers). Layer 2 is the “bottleneck” 
with a single neuron u giving the nonlinear principal component (NLPC). Layers 
1 and 3 each have m hidden neurons, (b) The NN model used for extracting a 
closed curve NLPCA solution. At the bottleneck, there are now two neurons p and q 
constrained to lie on a unit circle in the p-q plane, giving effectively one free angular 
variable 6, the NLPC. 


To perform NLPCA, the NN model (Fig. la) is a standard feed-forward 
(multi-layer perceptron) NN with 3 “hidden” layers of variables or “neurons” 
sandwiched between the input layer x on the left and the output layer x' 
on the right, where the middle hidden layer has only a single “bottleneck” 
neuron u. As an autoassociative model, the MSE between the output x' and 
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the input x is minimized, and data compression is achieved by the bottleneck, 
yielding the nonlinear principal component (NLPC) u (see Appendix A for 
details). Model complexity can be increased by increasing m, the number of 
hidden neurons in layer 1 and in layer 3 of the NN (Fig. la). Common PCA 
algorithms also extract higher PCA modes, with the directions of these modes 
being orthogonal to each other. In NLPCA, upon subtracting the NLPCA 
mode 1 solution from the data, the residual can be input into the NLPCA 
model again to extract the next mode, and so forth, although the orthogonality 
property of the PCA modes are lost in NLPCA. 

Relative to the various other choices for nonlinear dimensionality reduc¬ 
tion (kernel PCA, principal curves, LLE and isomap), NLPCA has some nice 
properties: For instance, its NN architecture provides a continuous (and dif¬ 
ferentiable) mapping function from x to the NLPC u, and also from u to 
x'. In contrast, in kernel PCA, the inverse mapping from u to x' is a very 
difficult problem, which until recently lacked numerically stable algorithms 
[10]. When a new datum x new becomes available, its NLPC and its projection 
onto the NLPCA mode are easily obtained from the NN mapping functions 
in NLPCA, whereas methods such as principal curves, LLE and isomap do 
not automatically provide mapping functions to handle the new datum. 

The simple correlation between two variables x and y has been generalized 
in multivariate analysis by CCA, which finds the strongest correlated mode(s) 
between two sets of variables, x and y. The first CCA mode extracts the 
linear oscillation in the x-space which is most strongly correlated with a linear 
oscillation in the y-space. NLCCA removes the restriction of only looking for 
linear oscillations in the two spaces. Various approaches based on NN and 
kernel methods have been proposed for NLCCA [11, 12, 13, 14, 15]. Reference 
[12] used three feedforward NN mappings to perform NLCCA (Fig. 2) (details 
in Appendix B). 

When using nonlinear machine learning methods such as NN, the pres¬ 
ence of noise in the data can lead to overfitting (i.e. fitting to the noise). 
Regularization (e.g. the addition of weight penalty or decay terms in the cost 
functions in NN models) has been commonly used to control overfitting by 
limiting model complexity (i.e. the effective number of model parameters) via 
the size of the weight penalty parameter(s) [1]. A larger weight penalty pa¬ 
rameter P tends to give less nonlinear solutions than a smaller P. Typically, 
to find the appropriate P in nonlinear regression and classification, a number 
of models are trained with different P values. The models’ MSE are validated 
on independent data not used in the model training stage, and the model 
with the lowest validated MSE is selected as the best. Alternatively, Bayesian 
methods have been developed to automatically estimate the size of the weight 
penalty parameter in nonlinear regression and classification problems [16, 17]. 
Since overfitting is much more serious in NLPCA than in NLR [18], a different 
approach is needed. 

In this chapter, we review the recent efforts directed towards making the 
nonlinear multivariate methods more robust in dealing with noisy data - 
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Fig. 2: The three feed-forward NNs used to perform NLCCA. The double-barreled 
NN on the left maps from the inputs x and y to the canonical variates u and v. The 
cost function Ji forces the correlation between u and v to be maximized. On the 
right side, the two NNs map inversely from u and v to the original x and y spaces. 
The top NN maps from u to the output layer x', with the cost function J 2 basically 
minimizing the MSE of x' relative to x. The third NN maps from v to the output 
layer y ', with the cost function J 3 basically minimizing the MSE of y' relative to y. 


the use of an information criterion to alleviate overfitting in NLPCA in 
Sect. 2, and a robust version of nonlinear canonical correlation analysis in 
Sect. 3 - new developments since the review of [2]. We also discuss two 
common causes undermining nonlinear models relative to linear models: 
(a) Time-averaging of data (e.g. from daily data to seasonal data) linearizes 
the relation between predictor and predictand, due to the central limit theo¬ 
rem in statistics (Sect. 4). (b) When new predictor data lies outside the train¬ 
ing range, the nonlinear model may extrapolate more poorly than the linear 
model, thereby lowering the forecast skill of the nonlinear model (Sect. 5). 


2 Alleviating Overfitting in NLPCA 

In the limit of infinite sample size, overfitting is not a problem when perform¬ 
ing nonlinear regression on noisy data, since it can be shown that the output 
of a flexible enough NLR model approximates the conditional mean of the 
target data (Sect. 6.1.3 of [1]). While overfitting can also occur in NLPCA 
[4, 18, 19, 20], the situation is actually far worse than in NLR, because even 
in the limit of infinite sample size, overfitting is a problem when applying 
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NLPCA to noisy data. As illustrated in Fig. 3, overfitting in NLPCA can 
arise from the geometry of the data distribution, instead of from the relative 
scarsity of observations. Here for a Gaussian-distributecl data cloud, a non¬ 
linear model with enough flexibility will find the zigzag solution of Fig. 3b as 
having a smaller MSE than the linear solution in Fig. 3a. Since the distance 
between the point A and a, its projection on the NLPCA curve, is smaller in 
Fig. 3b than the corresponding distance in Fig. 3a, it is easy to see that the 
more zigzags there are in the curve, the smaller is the MSE. However, the two 
neighbouring points A and B , on opposite sides of an “ambiguity” line [6, 21], 
are projected far apart on the NLPCA curve in Fig. 3b. Thus simply searching 
for the solution which gives the smallest MSE is not a sufficient criterion for 
NLPCA to find the best solution in a highly noisy dataset. 



a b 




Fig. 3: Schematic diagram illustrating how overfitting can occur in NLPCA of noisy 
data (even in the limit of infinite sample size), (a) PCA solution for a Gaussian data 
cloud (shaded in grey), with two neighbouring points A and B shown projecting to 
the points a and b on the PCA straight line solution, (b) A zigzag NLPCA solution 
found by a flexible enough nonlinear model. Dashed lines illustrate “ambiguity” 
lines where neighbouring points (e.g. A and B) on opposite sides of these lines are 
projected to a and 6, far apart on the NLPCA curve. 
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As the NLPCA model of [3] tends to extract zigzag solutions, [4] added 
weight penalty to the NLPCA model, which brought the overfitting under 
control. Unfortunately, there was no simple way to objectively estimate the 
appropriate P value needed to avoid overfitting (and underfitting), because 
with NLPCA, if the overfitting arises from the data geometry (as in Fig. 3b) 
and not from the relative scarsity of observations, using independent data to 
validate the MSE from the various models is not a viable method for selecting 
the appropriate P. Instead, [18] proposed a new “inconsistency” index for 
detecting the projection of neighbouring points to distant parts of the NLPCA 
curve: 

For each data point x and its nearest neighbour x, the NLPC for x and 
x are u and u, respectively. With C ( u , u) denoting the (Pearson) correlation 
between all the pairs (u,u), the inconsistency index I is defined by 

I = l-C(u,u). (1) 

If for some nearest neighbour pairs, u and u are assigned very different values, 
C(u,u ) would have a lower value, leading to a larger /, indicating greater 
inconsistency in the NLPC mapping. With u and u standardized to having 
zero mean and unit standard deviation, (1) is equivalent to 

I=\{{u-uf ), (2) 

where (• • •) denotes averaging over all observations. 

In statistics, various criteria, often in the context of linear models, have 
been developed to select the right amount of model complexity so neither 
overfitting nor underfitting occurs. These criteria are often called “information 
criteria” (IC), e.g. the Akaike IC [22], the Bayesian IC [23], etc. An IC is 
typically of the form 


IC = MSE + complexity term, (3) 

where MSE is evaluated over the training data and the complexity term is 
larger when a model has more free parameters. The IC is evaluated over a 
number of models with different free parameters, and the model with the 
minimum IC is selected as the best. As the presence of the complexity term 
in the IC penalizes models which use excessive number of free parameters to 
attain low MSE, choosing the model with the minimum IC would rule out 
complex models with overfitted solutions. 

Due to the presence of multiple minima in the cost function, we randomly 
divide the data into a training data set and a validation set (containing 85% 
and 15% of the original data, respectively, in the following examples), and for 
every given value of P and m, we train the model a number of times from 
random initial weights, and discard model runs where the MSE evaluated over 
the validation data is larger than the MSE over the training data. To choose 
among the model runs which have passed the validation test, a new holistic 
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IC to deal with the type of overfitting arising from the broad data geometry 
(Fig. 3b) is introduced as 

H = MSE + inconsistency term (4) 

= MSE - C(u, u) x MSE = MSE x I , (5) 

where MSE and C are evaluated over all (training and validation) data, in¬ 
consistency is penalized, and the model run with the smallest H value is se¬ 
lected as the best. The general tendency as more model parameters are used 
is for MSE to decrease but eventually I increases sharply, thereby produc¬ 
ing a minimum in H. (/ itself may also have a minimum, but that minimum 
tends to choose a model with too few parameters, thus underfitting the data.) 
There is some randomness in the computed H value, since local minima in 
the cost function introduce randomness in the MSE. Furthermore, [21] showed 
that the NLPC u is not uniquely defined, since v = g{u) for any invertible 
function g would give the same MSE and NLPC A approximation. Hence u 
is also a source of randomness for I and H. As we have restrained u by 
adding normalization conditions in the cost function (A.2), the randomness 
introduced by u in I and H does not appear to affect their effectiveness in 
practice. 

Note that as the inconsistency term only prevents overfitting arising from 
the broad data geometry, validation data were still needed to prevent “local” 
overfitting from excessive number of model parameters, since H , unlike (3), 
does not contain a complexity term. 

A test problem was set up in [18]: For a random number t uniformly 
distributed in the interval (—1,1), the signal was generated by using a 
quadratic relation 

x[ s) =t , x ^ s) = i< 2 . (6) 

Isotropic Gaussian noise (with variance being one half the average variance 
of x[ s ^ and x^) was then added to the signal to give the noisy data x 
with 500 observations. NLPC A was performed on the data using the network 
in Fig. la with to = 4 (to being the number of hidden neurons in the first and 
in the third hidden layers of the NN) and with the weight penalty parameter 
P at various values (10, 1, 10 -1 , 10 -2 , 10~ 3 , 10 -4 , 10 -5 , 0). For each value of 
P , the model training was done 30 times starting from random initial weights, 
and model runs where the MSE evaluated over the validation data was larger 
than the MSE over the training data were deemed ineligible. In the traditional 
approach, among the eligible runs over the range of P values, the one with the 
lowest MSE over all (training and validation) data was selected as the best. 
Figure 4a shows this solution where the zigzag curve retrieved by NLPCA 
is very different from the theoretical parabolic signal (6), demonstrating the 
pitfall of selecting the lowest MSE run. 

In contrast, in Fig. 4b, among the eligible runs over the range of P values, 
the one with the lowest information criterion H was selected. This solution, 
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(a) Min. MSE solution (P = 0.0001) 



(b) Min. 1C solution (P = 0.1) 



(c) Min. 1C with MAE norm (P = 0) 



Fig. 4: The NLPCA solution (shown as densely overlapping black circles ) for the 
synthetic dataset (dots), with the parabolic signal curve indicated by and the 
linear PCA solution by the dashed line. The solution was selected from the multiple 
runs over a range of P values based on (a) minimum MSE, (b) minimum IC H, and 
(c) minimum IC together with the MAE norm. 
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which has a much larger weight penalty (P = 0.1) than that in Fig. 4a (P = 
10~ 4 ), shows less wiggly behaviour and better agreement with the theoretical 
parabolic signal. 

Even less wiggly solutions can be obtained by changing the error norm 
used in the cost function from the mean square error to the mean absolute 
error (MAE), i.e. replacing (||x — x'|| 2 ) by \xj — x'j\) in Eq. (A.2). The 
MAE norm is known to be robust to outliers in the data (p. 210 of [1]). Fig. 
4c is the solution selected based on minimum H with the MAE norm used. 
While wiggles are eliminated, the solution underestimates the curvature in 
the parabolic signal. 

The H IC approach was also tested on a real climate dataset [18], namely 
the tropical Pacific sea surface temperature (SST), where the interannual vari¬ 
ability is dominated by the El Nino-Southern Oscillation (ENSO) phenomenon 
[24]. The monthly SST anomalies (1948-2005) were obtained by removing the 
climatological seasonal cycle (i.e. subtracting from each monthly SST value 
the climatological mean value for that month). (NLPCA can be performed 
even if the climatological seasonal cycle is not removed, as was done in [4].) 
The 7 leading principal components (PC) containing 86.5% of the variance 
were retained, and served as the inputs for the NLPCA model. 

NLPCA was performed over a range of m and P values (m = 2,..., 6, and 
P = 10, 1, 10 _1 , 10 -2 , 10 -3 , 10 -4 , 10 —5 , 0). For each combination of m and 
P, 100 runs starting from random inital weights were made. Among all the 
runs made over the whole range of m and P values, the one with the lowest 
H was selected as the best (with m = 5, P = 0). The NLPCA solution is a 
curve in the 7-dimensional PC space. 

We next compare the best solutions found for different hidden neuron 
number m. Since the overall best solution based on minimum H was for m = 5, 
we also showed the best solution found for to = 2 and to = 6 in the PC1- 
PC2 plane (Fig. 5). The (normalized) MSE for the 3 solutions in Fig. 5 are 
0.898 (to = 2), 0.857 (to = 5) and 0.826 (to = 6), where for easy comparison 
with the linear mode, the values for the NLPCA solution have been divided 
by that from the PCA mode 1. For the (normalized) inconsistency index /, 
the values are 0.896 (to = 2), 0.879 (to = 5) and 0.946 (to = 6), while for 
the (normalized) H IC, the corresponding values are 0.804, 0.753 and 0.782 
respectively. Hence the m = 6 solution has a lower MSE than the m = 5 
solution, but the increased inconsistency from its wiggly curve (Fig. 5c) led 
to a larger I and a larger H. Compared to the to = 2 solution, the to = 5 
solution has both lower MSE and lower I. 

The tropical Pacific SST example illustrates that with a complicated os¬ 
cillation like the El Nino-La Nina phenomenon, using a linear method such as 
PCA results in the nonlinear mode being scattered into several linear orthog¬ 
onal modes (in fact, all 3 leading PCA modes are related to this phenomenon) 
[4]. This brings to mind the famous parable of the three blind men and their 
disparate descriptions of an elephant - hence the importance of the NLPCA 
as a unifier of the separate linear modes. In the study of climate variability, 
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(a) m = 2 



PCI 

(b) m = 5 



PCI 

(c) m = 6 



PCI 


Fig. 5: The best NLPCA mode 1 solution (for the SST anomaly data) selected for 
(a) m = 2 (b) m = 5 and (c) m = 6. The solution is shown only in the PC1-PC2 
plane, with the linear PCA mode 1 solution indicated by the dashed line. The warm 
El Nino episodes are represented by dots in the upper right corner and the cool La 
Nina episodes, in the upper left corner. 
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the wide use of PCA methods has created the not entirely accurate view that 
our climate is dominated by a number of spatially fixed oscillatory patterns, 
which is in fact due to the limitation of the linear method. Applying NLPCA 
to the tropical Pacific SSTA, one finds no spatially fixed oscillatory patterns, 
but an oscillation evolving in space as well as in time [2]. 

While the NLPCA is capable of finding a continuous open curve solution, 
there are many phenomena involving waves or quasi-periodic fluctuations, 
which call for a continuous closed curve solution. Reference [25] introduced 
an NLPCA with a circular node at the network bottleneck [henceforth referred 
to as the NLPCA(cir)], so that NLPCA(cir) is capable of approximating the 
data by a closed continuous curve. Fig. lb shows the NLPCA(cir) network, 
which is identical to the NLPCA of Fig. la except at the bottleneck, where 
there are now two neurons p and q constrained to lie on a unit circle in the p-q 
plane, so there is effectively only one free angular variable (9, the NLPC (see 
Appendix A). This network has also been used to perform nonlinear singular 
spectrum analysis [26]. 

Although NLPCA(cir) is designed for extracting closed curve solutions, 
it is also capable of extracting an open curve solution. The reason is that if 
the input data mapped onto the p-q plane covers only a segment of the unit 
circle instead of the whole circle, then the inverse mapping from the p-q space 
to the output space will yield a solution resembling an open curve. Hence, 
NLPCA(cir) may extract either a closed curve or an open curve approximation 
to a dataset. The IC H not only alleviates overfitting in open curve solution, 
but also chooses between open and closed curve solutions. The inconsistency 
index and the IC are now obtained from 

1 = 1 [CM + C(q, q)} , and H = MSE x I , (7) 

where p and q are from the bottleneck (Fig. lb), and p and q are the corre¬ 
sponding nearest neighbour values. 

For a test problem, consider a Gaussian data cloud (with 500 observations) 
in 2-dimensional space, where the standard deviation along the X\ axis was 
double that along the xi axis. The data set was analyzed by the NLPCA(cir) 
model with m = 2,..., 5 and P = 10, 1, lO" 1 , 10“ 2 , KT 3 , 10" 4 , 10“ 5 , 0. 
From all the runs, the solution selected based on the minimum MSE has 
to = 5 (and P = 10~ 5 ) (Fig. 6a), while that selected based on minimum 
H has m = 3 (and P = 10~ 5 ) (Fig. 6b). The minimum MSE solution has 
(normalized) MSE = 0.370, I = 9.50 and H = 3.52, whereas the minimum H 
solution has the corresponding values of 0.994, 0.839 and 0.833, respectively. 
Thus the IC correctly selected a nonlinear solution (Fig. 6b) which is similar 
to the linear solution. (Due to finite sample size, the curve solution does not 
exactly match the straight line, which would require infinite sample size). The 
IC also rejected the closed curve solution of Fig. 6a, in favour of the open curve 
solution of Fig. 6b, despite its much larger MSE. 
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(a) Min. MSE solution (m = 5, P = 10 5 ) 



(b) Min. 1C solution (m = 3, P = 10 5 ) 



Fig. 6: The NLPCA(cir) mode 1 for a Gaussian dataset, with the solution selected 
based on (a) minimum MSE and (b) minimum IC. The PCA mode 1 solution is 
shown as a dashed line. 


The IC also performed well when there is a strongly nonlinear signal in the 
noisy data, as demonstrated in Fig. 9 of [18]. For real data, the method was 
applied successfully to the quasi-biennial oscillation (QBO) in the equatorial 
stratospheric wind [18]. In summary, the application of the IC to NLPCA 
and NLPCA(cir) has been successful in model selection, i.e. choosing the best 
model which neither overfits nor underfits the data. 
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3 Robust NLCCA 

NLCCA, the nonlinear analog of CCA outlined in Appendix B, is able to 
describe coupled nonlinear variability between two multivariate datasets. It 
has been used to study the nonlinear relation in the tropical Pacific between 
the sea level pressure (SLP) field and the SST field [27], and between the wind 
stress and the SST [28], as well as the nonlinear relation between the tropical 
Pacific SST and the extratropical atmospheric variability [29]. 

Due to its complicated architecture, NLCCA is prone to overfitting, par¬ 
ticularly when applied to the short, noisy datasets common in climate studies. 
We explore the use of robust cost functions as a means of improving the per¬ 
formance of NLCCA. The basic model architecture is kept intact. Instead, the 
cost functions used to set the model parameters are replaced with more robust 
versions. A cost function based on the biweight midcorrelation [30] replaces 
one based on the Pearson correlation, and cost functions based on MAE can 
be used to replace ones based on MSE. 

The Pearson correlation is not a robust measure of association between two 
variables, as its estimates can be affected by the presence of a single outlier 
[30]. For short, noisy datasets the cost function J\ [Eq. (B.3) in Appendix B] 
using the Pearson correlation (cor(u, v)) may lead to overfitting in NLCCA. 
For instance, when applying NLCCA to detect the relation between the trop¬ 
ical Pacific SLP and SST, [31] found that a spurious correlation of 1.00 was 
obtained by NLCCA. In this case, both the SLP and SST data contained the 
very strong El Nino signal during 1997-1998, the strongest El Nino in the 
data record from 1948-2003. The double-barreled NN on the left hand side of 
Fig. 2 then used strongly nonlinear mapping functions to produce canonical 
variates u and v with extremely large magnitude during 1997-1998, leading 
to the spuriously high value of cor (u,v) = 1.00. In contrast, the correlation 
obtained after excluding the (u,v) values during 1997-1998 was only 0.28. 

Robust correlation coefficients, including the Spearman rank correlation 
and the biweight midcorrelation, are reviewed by [30]. After testing both the 
Spearman correlation and the biweight midcorrelation, [31] proposed replac¬ 
ing the non-robust Pearson correlation by the robust biweight midcorrelation 
“bicor” as defined by Eq. (B.12), i.e. the cost function J\ for the NN on the 
left hand side of Fig. 2 has cor (u,v) in (B.3) replaced by bicor(u, v). For the 
two NNs on the right hand side of Fig. 2, one further has the option of re¬ 
placing the L 2 error norm with the robust L\ norm in (B.6) and (B.7), i.e. 
replacing the MSE by the MAE in these cost functions. 

Reference [31] used the synthetic test problem of [12] to compare the per¬ 
formance of the robust and non-robust versions of NLCCA. The synthetic 
data contains two correlated modes plus noise. The first correlated mode (x 
and y) is given by 


x\ = t — 0.3t 2 , X 2 = t + 0.3£ 2 , X 3 =t 2 , 


(8) 
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Vi=t 3 , y 2 = -t + 0.3t 3 , y 3 = t + 0.3t 2 , (9) 

where t is a uniformly distributed random number in [—1, 1], The second 
correlated mode (x and y) is given by 

x\ — — s — 0.3s 2 , x 2 — s — 0.3s 3 , x 3 = —s 4 , (10) 

j/i = sech(4s), y 2 = s + 0.3s 3 , y 3 = s - 0.3s 2 , (11) 

where s is a uniformly distributed random number in [—1, 1], The shapes of 
the correlated modes are given in [12] and [31]. 

To test the performance of the NLCCA models, 50 training and test 
datasets, each with 500 observations, were randomly generated from Eqs. 
(8-11). The signal in each dataset was produced by adding the second mode 
to the first mode, with the variance of the second equal to one third that of 
the first. Normally distributed random noise with standard deviation equal 
to 50% of the signal standard deviation was added to the data. The variables 
were then standardized to zero mean and unit standard deviation. 

NLCCA models with different combinations of the non-robust (cor and 
MSE) and robust (bicor and MAE) cost functions were developed on the 
training datasets and applied to the test datasets. All NNs had three neurons 
in their hidden-layers and were trained without weight penalty terms. To avoid 
local minima in the cost functions, each network in Fig. 2 was trained 30 times 
from different random initial weights and biases. The network with the lowest 
value of its associated cost function was then selected for use and applied to 
the test data. 

Root MSE (RMSE) values between the first synthetic mode and the first 
mode extracted by NLCCA models with different combinations of non-robust 
and robust cost functions are shown in Fig. 7 for the 50 test datasets. On 
average, all models performed approximately the same, although, for the lead¬ 
ing NLCCA mode of the x dataset, NLCCA with bicor/MSE cost functions 
yielded the lowest median RMSE (0.44), followed by NLCCA with bicor/MAE 
(0.45) and NLCCA with cor/MSE (0.45). NLCCA with cor/MAE performed 
worst with a median RMSE of 0.47. Median RMSE values and relative rank¬ 
ings of the models were the same for the leading NLCCA mode of the y 
dataset. 

Of the four models, NLCCA with the robust cost functions (bicor/MAE) 
was the most stable. No trial yielded an RMSE in excess of the series standard 
deviation of one, with the maximum value under 0.6 for the x mode. The other 
models had at least one trial with an RMSE value greater than one, which is 
indicative of severe overfftting. Maximum values for the x mode ranged from 
1.8 for NLCCA with bicor/MSE, to 47.4 for NLCCA with cor/MSE, and 49.6 
for cor/MAE. NLCCA with bicor/MAE performed similarly for the y mode, 
although two trials with RMSE greater than 20 were found for NLCCA with 
bicor/MSE cost functions. 

Overall, results for the synthetic dataset suggest that replacing the cor/MSE 
cost functions in NLCCA with bicor/MAE cost functions leads to a more sta- 
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(a) Mode 1 for x 


(b) Mode 1 for y 
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Fig. 7: Boxplots showing the distribution of RMSE between the first synthetic mode 
and the first mode extracted by NLCCA models for (a) x and (b) y with different 
combinations of non-robust and robust cost functions over 50 trials. Boxes extend 
from the 25th to 75th percentiles, with the line indicating the median. Whiskers 
represent the most extreme data within ±1.5 times the interquartile range (i.e., the 
box height); values outside this range are plotted as dots. The dashed line indicates 
a RMSE equal to one. The ordinate is log-scaled to accomodate the large range in 
RMSE. 


ble model that was less susceptible to overfitting and poor test performance. 
All models were run without weight penalty terms in this comparison. In 
practice, the non-robust models will need weight penalty terms to reduce 
overfitting, as is done in the next test. 

Reference [31] applied NLCCA to tropical Pacific monthly SLP and SST 
data from 1948 to 2003. The climatological seasonal cycle was removed, data 
were detrended by removing the long-term linear trend, and a 3-month run¬ 
ning mean filter was applied. After PCA, the first 6 SST PCs (accounting for 
73% of the total SST variance) and the 6 SLP PCs (accounting for 80% of 
the variance) were retained for further analysis. 

Three variants of the NLCCA model were applied to the SLP and SST 
datasets. The first, representing the standard NLCCA model, incorporated 
both non-robust cost functions (cor/MSE). The second and third used the 
bicor cost function to train the double-barreled network and either the MAE 
or MSE cost function to train the inverse mapping networks. 















112 W.W. Hsieh and A.J. Cannon 


To assess the usefulness of the three variants of NLCCA for seasonal fore¬ 
casting, models were validated on the basis of their forecast performance. 
PC scores from the SLP dataset were used to predict PC scores from the 
SST dataset at lead times of 0, 3, 6 , 9, and 12-months. (Lead times are de¬ 
fined as the number of months from the predictor observation to the pre- 
dictand observation, e.g., a forecast with a 3-month lead time from Jan¬ 
uary would be for April.) Taking x to be historical values of the SLP PC 
scores and y to be historical values of the SST PC scores, forecasts for a 
new case y' were made as follows. First, the double-barreled network was 
trained with x and y as inputs and the resulting values of u and v were 
used to train the inverse mapping networks. Given a new SLP data point 
x, a new value of the canonical variate u was obtained from the double- 
barreled network. Regression equations [e.g. Eq. (B.9)] were then used to 
predict a new value of v', which was entered into the appropriate inverse 
mapping network to give y'. For the second and higher NLCCA modes, 
the same procedure was followed using residuals from the previous mode as 
inputs. 

Following [27], NNs were trained both with and without weight penalty 
terms using two hidden neurons. To avoid overfitting in models trained with 
weight penalty, values of the coefficients Pi, P 2 , and P 3 in Eqs. (B.3), (B. 6 ) and 
(B.7) were determined via 10-fold cross-validation on the training dataset. The 
training record was split into 10 contiguous segments. Models were trained on 
9 of the 10 segments using weight penalties from the set {10, 1, 10 -1 , 10 -2 , 
10~ 3 , 1CT 4 , 10 -5 , 10~ 6 , 0}. Forecasts on the remaining segment were then 
recorded for each weight penalty coefficient. While fixing the weight penalties, 
these steps were repeated 9 times, each time making forecasts on a different 
segment. Finally, forecasts for all 10 segments were combined and validated 
against observations. Weight penalties that minimized the aggregated cross- 
validation error were recorded, NNs were retrained using these penalties on 
all 10 segments combined. Ten models were trained in this manner to assess 
sensitivity to initial weights and biases. 

A second round of cross-validation was used to estimate out-of-sample 
forecast performance of the models. The historical record was split into 5 
segments (each approximately 11 years in length). Models were trained on 4 of 
the 5 segments using the cross-validation procedure outlined above. Forecasts 
on the remaining segment were then recorded. These steps were repeated 4 
times, each time making forecasts on a different segment. Finally, forecasts 
for all 5 segments were combined and compared with observations. 

Results from NLCCA models with one extracted mode are shown in Fig. 8 . 
Cross-validated Pearson correlation skill is averaged over the entire tropical 
Pacific domain following reconstruction of the SST field from the predicted 
SST PCs. Results with weight penalty are only given for the NLCCA model 
with cor/MSE cost functions as the addition of penalty terms to models with 
the bicor cost function did not generally lead to significant changes in skill. 
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Without weight penalty, the NLCCA model with cor/MSE cost functions 
performed poorly, exhibiting mean skills worse than CCA at all lead times. 
Even with concurrent predictor/predictand fields, the mean correlation skill 
was lower than 0.2. NLCCA with bicor/MSE cost functions and bicor/MAE 
cost functions performed much better, with mean correlation skills exceeding 
0.5 at the 0-month lead time. Over the 10 trials, minimum skills from models 
incorporating the bicor cost function were higher than maximum skills from 
the corresponding cor/MSE models without weight penalty. 

For NLCCA with cor/MSE cost functions, minimum correlations were 
lower than zero (i.e., no cross-validation skill) for 6, 9, and 12-month lead 
times. All NLCCA models with bicor/MSE and bicor/MAE cost functions, 
even those at a 12-month lead time, showed positive skill. In general, NLCCA 
models with bicor exhibited the least variability in skill between repeated 
trials. In no case was the range between minimum and maximum skill greater 
than 0.2. For NLCCA with cor/MSE cost functions, the range in skill exceeded 
0.2 at all lead times, indicating a very unstable model. 

Little difference in skill was evident between bicor/MSE and bicor/MAE 
models, which suggests that the switch from cor to bicor in the double-barreled 
network cost function was responsible for most of the increase in skill relative 
to the standard NLCCA model. 

Results discussed to this point have been for NLCCA models without 
weight penalty. Addition of weight penalty to the standard NLCCA model 
resulted in improvements in the mean correlation skill, although performance 
still lagged behind NLCCA with the bicor cost function at 9 and 12-month 
lead times. At 0, 3, and 6-month lead times, maximum skill over the 10 trials 
did, however, exceed the mean level of skill of the bicor-based models, which 
suggests that an appropriate amount of weight penalty can result in a good 
performing model. However, the wide range in performance over the 10 trials 
(e.g., at 0 and 6-month lead times) reflects the instability of the training 
and cross-validation steps needed to choose the weight penalty coefficients. In 
practice, it may be difficult to consistently reach the performance level of the 
robust model by relying solely on weight penalty to control overfitting of the 
standard NLCCA model. 

Returning to the NLCCA models with bicor/MSE and bicor/MAE 
cost functions, little difference in skill between the models is apparent from 
Fig. 8. At short lead times (0 and 3-months), when the signal is strongest, 
the bicor/MSE model performed slightly better than the bicor/MAE model, 
whereas at the longest lead time (12-months), when the signal is weakest, the 
bicor/MAE model performed best (and with less variability among runs). 

NLCCA models with the bicor/MSE and bicor/MAE cost functions tended 
to perform slightly better than CCA. For the bicor/MAE model, the small 
improvement in performance was significant (i.e., minimum skill over the 10 
trials exceeded CCA skill) at 0, 3, 6, and 12-month lead times, while the same 
was true of the bicor/MSE model at 0 and 3-month lead times. 
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o 



Lead time (months) 


Fig. 8: Cross-validated correlation skill for NLCCA models trained with cor/MSE, 
bicor/MSE, and bicor/MAE cost functions. Weight penalty was applied to the model 
denoted cor/MSE(p). Bars show the mean correlation over the spatial domain, av¬ 
eraged over the 10 trials. Vertical lines extend from the minimum to the maximum 
spatial mean correlation from the 10 trials. Horizontal lines show correlation skill 
from the CCA model for comparison. The ordinate is limited to showing positive 
cross-validated skill. 


4 Effects of Time-Averaging 

In this section, we discuss one of two main factors undermining the ad¬ 
vantage of nonlinear models over linear models. Time-averaging is widely 
used to reduce noise in the data; however, it also linearizes the relations in 
the dataset. In a study of the nonlinear relation between the precipitation 
rate (the predictand) and 10 other atmospheric variables (the predictors) in 
the NCEP/NCAR reanalysis data [32, 33] examined the daily, weekly and 
monthly averaged data by nonlinear multiple regression using NN over 3 re¬ 
gions (British Columbia, Canada, Middle East and northeastern China), and 
discovered that the strongly nonlinear relations found in the daily data be¬ 
came dramatically reduced by time-averaging to the almost linear relations 
found in the monthly data. 
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To explain this phenomenon, [33] invoked the well-known central limit the¬ 
orem from statistics. For simplicity, consider the relation between two vari¬ 
ables x and y. If y = f(x) is a nonlinear function, then even if x is a normally 
distributed random variable, y will in general not have a normal distribution. 
Now consider the effects of time-averaging on the (x, y) data. The bivariate 
central limit theorem [34] says that if (aq, y±),. .., (x n , y n ) are independent 
and identically distributed random vectors with finite second moments, then 
(X,Y), obtained from averaging (xi, yi),..., (x n , y n ), will, as n —> oo, ap¬ 
proach a bivariate normal distribution N(pi, /i 2 , erf, p), where pi and p, 2 
are the mean of X and Y, respectively, af and <rf are the corresponding 
variance, and p the correlation between X and Y. 

From the bivariate normal distribution, the conditional probability distri¬ 
bution of Y (given X) is also a normal distribution [34], with mean 

Y[Y\X] = p 2 + (X-p 1 )pa 2 /a 1 . (12) 

This linear relation in X explains why time-averaging tends to linearize the 
relationship between the two variables. With more variables, the bivariate nor¬ 
mal distribution readily generalizes to the multivariate normal distribution. 

To visualize this effect, consider the synthetic dataset 

y = x + x 2 + e, (13) 

where a; is a Gaussian random variable with unit standard deviation and e is 
Gaussian noise with a standard deviation of 0.5, so each day’s value is inde¬ 
pendent of that of the next day. Averaging this “daily” data over 7 consecutive 
days and over 30 days reveals a dramatic weakening of the nonlinear relation 
(Fig. 9), and the shifting of the y density distribution towards Gaussian with 
the time-averaging. With real data, there is autocorrelation in the time series, 
so the monthly data will be effectively averaging over far fewer than 30 inde¬ 
pendent observations as done in this synthetic dataset. Seasonal data in the 
extratropics, however, will probably involve averaging about 30 independent 
observations. 

If the data has strong autocorrelation, so that the integral time scale from 
the autocorrelation function is not small compared to the time-averaging win¬ 
dow, then there are actually few independent observations used during the 
time-averaging, and the central limit theorem does not apply. For instance, 
the eastern equatorial Pacific sea surface temperatures have an integral time 
scale of about a year, hence nonlinear relations can be detected from monthly 
or seasonal data, as found by NLPCA and NLCCA. In contrast, the mid¬ 
latitude weather variables have integral time scales of about 3-5 days, so 
monthly averaged data would have effectively averaged over about 6-10 inde¬ 
pendent observations, and seasonal data over 20-30 independent observations, 
so the influence of the central limit theorem cannot be ignored. 

While time-averaging tends to reduce the nonlinear signal, it also smooths 
out the noise. Depending on the type of noise (and perhaps on the type of 
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Fig. 9: Effects of time-averaging on a nonlinear relation, (a) Synthetic “daily” data 
from a quadratic relation between x and y, and the data time-averaged over (b) 7 
observations and (c) 30 observations. The probability density distribution of y is 
shown in (d) for cases (a), (b) and (c). 


nonlinear signal), it is possible that time-averaging may nevertheless enhance 
the detection of a nonlinear signal above the noise for some datasets. In short, 
researchers should be aware that time-averaging could have a major impact 
on our modelling or detection of nonlinear empirical relations. 


5 Extrapolation 

We next discuss another factor which undermines the advantage of nonlin¬ 
ear models over linear models, namely extrapolation. For NLR problems, NN 
models (with proper weight penalty so there is neither overfitting nor under¬ 
fitting) perform nonlinear interpolation well. However when presented with 
new data where the predictor lies beyond the range of (predictor) values used 
in model training, the NN model is then extrapolating instead of interpolating. 
We will illustrate the extrapolation behaviour with a simple test problem. 
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Let the signal be 


y = x + 



(14) 


We choose 300 observations, with x having unit standard deviation and Gaus¬ 
sian distribution, and y given by (14) plus Gaussian noise (with the noise 
standard deviation the same as the signal standard deviation). With 6 hidden 
neurons, the Bayesian NN model from the MATLAB Neural Network Toolbox 
was used to solve this NLR problem. In Fig. 10a, upon comparing with the 
true signal (dashed curve), it is clear that the NN model interpolated better 
than the LR model (solid line), but for large x values, NN extrapolated worse 
than LR. Fig. 10b shows the same data fitted by a fourth order polynomial, 
where for strongly negative x values, the polynomial extrapolated worse than 
LR. Hence nonlinear models which interpolate better than LR provide no 
guarantee that they extrapolate better than LR. In fact, Wu et al. [35] found 
that for seasonal forecasting of the North American surface air temperature, 
the NN model extrapolated worse than LR. 

How the nonlinear model extrapolates is dependent on the type of nonlin¬ 
ear model used. With a polynomial fit, as |a;| —> oo, |y| —► oo. However, for 
NN models (with 1 hidden layer h), where the fcth hidden neuron 


hk = tanh((Wx + b) fc ), y = w-h + b , 


(15) 


once the model has been trained, then as ||x|| —> oo, the tanh function remains 
bounded within ±1, hence y remains bounded in sharp contrast to the 
unbounded behaviour with polynomial extrapolation (Fig. 10). 

6 Summary and Conclusion 

The nonlinear generalization of classical multivariate statistical methods by 
machine learning methods such as NN is exciting. However, when applied 
to environmental sciences, the datasets may be very noisy and/or contain 
relatively few independent observations, and the nonlinear methods may fail. 
Thus it is essential that more robust nonlinear methods be developed. 

With noisy data, not having plentiful observations could cause a flexible 
nonlinear model to overfit. In the limit of infinite sample size, overfitting 
cannot occur in nonlinear regression, but can still occur in NLPCA due to 
the geometric shape of the data distribution. A new inconsistency index I 
for detecting the projection of neighbouring points to distant parts of the 
NLPCA curve has been introduced, and incorporated into a holistic IC H 
to choose the model with the appropriate weight penalty parameter and the 
appropriate number of hidden neurons [18]. Tests with synthetic data and 
real climate data indicated that this IC is effective in model selection, and in 
deciding between open curve and closed curve solutions. 

To make NLCCA more robust, non-robust cost functions in the model are 
replaced by robust cost functions - the Pearson correlation is replaced by 
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(a) NN fit 


(b) Polynomial fit 




Fig. 10: (a) Nonlinear regression fit by Bayesian NN. The data are indicated by 
crosses and the Bayesian NN solution by circles. Dashed curve indicates the theo¬ 
retical signal and solid line the LR solution, (b) Fit to the same dataset but by a 
fourth order polynomial. 


the biweight midcorrelation, while the MSE in the inverse mapping network 
can be replaced by the MAE [31]. Tests showed that replacing the Pearson 
correlation by the biweight midcorrelation greatly improved the stability of 
the NLCCA model. In contrast, the choice between the MSE or MAE cost 
function appears to be more problem dependent, and should be considered 
as part of the model selection process. The MATLAB codes for NLPCA and 
NLCCA are downloadable from http://www.ocgy.ubc.ca/projects/clim. 
pred/download.html. 

Two common causes undermining nonlinear models relative to linear mod¬ 
els have also been highlighted. (1) Time-averaging of data (e.g. from daily data 
to seasonal data) linearizes the relation between predictor and predictand due 
to the central limit theorem [33]. (2) When new predictor data lies outside 
the training range, the nonlinear model may extrapolate poorly, thereby un- 
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dermining the forecast capability of the nonlinear model [35]. How to improve 
the nonlinear model forecast at extrapolation points is currently being inves¬ 
tigated. 
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Appendix A: NLPCA 

With the input variables forming the Oth layer of the network in Fig. la, a 
neuron at the ith layer (i = 1, 2, 3,4) receives its value from the neurons 
vb^ 1 ) in the preceding layer, i.e. 

vf = /( < )( w W -vb-D + fef), (A.l) 

where Wj 1 ' 1 is a vector of weight parameters and l/:- ' 1 a bias or offset parame¬ 
ter, and the transfer or activation functions /d) and are the hyperbolic 
tangent functions, while f^ and are simply the identity functions. Effec¬ 
tively, a nonlinear function u = E(x) maps from the higher dimension input 
space to the lower dimension bottleneck space, followed by an inverse trans¬ 
form x' = G (u) mapping from the bottleneck space back to the original space, 
as represented by the outputs. To make the outputs as close to the inputs as 
possible, the cost function J, basically the MSE, is minimized. More precisely, 
[2] used 


J = (||x - x'|| 2 ) + (u) 2 + ((u 2 ) - l) 2 + PJ2 ll w ? : ’ll 2 - ( A -2) 

j 

where on the right hand side, the first term is the MSE (with (• • •) denoting 
an observation or time mean), the second and third terms are for restraining 
u towards ( u) = 0 and ( u 2 ) = 1 , and the final term is a weight penalty 
or regularization term, with P the weight penalty parameter. [4] found that 
penalizing just the first layer of weights is sufficient to limit the nonlinear 
modelling capability of the model. By minimizing J, the values of the weight 
and bias parameters are solved. The nonlinear optimization was carried out by 
the quasi-Newton algorithm fminunc.m in the MATLAB Optimization Toolbox. 
A number of optimization runs was made with random initial values of the 
weight and bias parameters, and only runs where the MSE evaluated over 
the validation data was not larger than the MSE over the training data were 
deemed eligible, with the best solution selected as the one with the smallest 
value of H , calculated from (5). 
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To obtain closed curve solutions, we use NLPCA with a circular bottleneck 
node (Fig. lb). At the bottleneck, we first calculate the pre-states p 0 and q 0 
by 

Po = wf } • v (1) + b ^ , and q 0 = , (A.3) 

then with 

r= (pi + q 2 o ) 1/2 , (A.4) 

the circular node is defined by 

P = Po/r , and q = q D /r , (A.5) 

which satisfies the unit circle equation p 2 + q 2 = 1. Thus, although there are 
two variables p and q at the bottleneck, there is only one angular degree of 
freedom (0) from the circle constraint. For more details, see the review by [2]. 
The model run having the smallest H , as computed from (7), is selected as 
the best solution. 


Appendix B: NLCCA 

Consider a dataset {xi(t)} with i variables and another dataset {yj(t)} with 
j variables, where each dataset has t = 1 observations. The variables 

{xi(t)} can be grouped to form the vector x(t) and the variables {yj(t)} can 
be grouped to form the vector y(t). CCA looks for the linear combinations 

u(t) = a-x(t), v(t)=b-y(t) (B.l) 

such that the Pearson correlation between the canonical variates u and v, i.e., 
cor(u, v), is maximized. 

In NLCCA, the nonlinear analog of linear CCA, the linear mappings in 
Eq. (B.l) are replaced with nonlinear mappings performed by NN (Fig. 2). 
The double-barreled network on the left-hand side nonlinearly maps x to u 
and y to v by 

h { k x) = tanh[(W^x + b^) fe ], u = +b {x) 

= tanh[(Wbby + b^-*);], v = w^-h^+b^ (B-2) 

where and are the hidden-layer neurons; and are the 

hidden-layer weight matrices; b (x ) and are the hidden-layer bias vec¬ 
tors; wd x ) and are the output-layer weight vectors; b ^ and b^ are the 
output-layer biases. The number of hidden-layer neurons controls the over¬ 
all complexity of the network; the hidden-layer must contain more than one 
neuron (2 < k < K and 2 < l < L) to obtain a nonlinear solution [27]. 

Weight and bias parameters in the double-barreled network are obtained 
by minimizing the cost function 
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J\ = —cor(it, v) + (u ) 2 + ( v } 2 + 



2 


+Pl 


EKT+EW”)' 

ki Ij 


(B.3) 


The first term maximizes the correlation between the canonical variates u and 
v; the second, third, fourth, and fifth terms are normalization constraints that 
force u and v to have zero mean and unit variance; the sixth term is a weight 
penalty whose relative magnitude is controlled by the parameter Pi. Larger 
values of Pi lead to smaller weights (i.e., fewer effective model parameters), 
which results in a more linear model. If tanh(-) is replaced by the identity 
function, then Eq. (B.2) reduces to Eq. (B.l) and the network performs linear 
CCA. 

Once the canonical variates u and v have been found, the inverse mappings 
to x' and y' are given by the two NNs on the right-hand side of Fig. 2: 

4 u) =tanh[(w (u) w + b ( “ ) ) fc ], x ' = W w hW+bW, (B.4) 

h\ v) = tanh[(w^u + b^) ; ], y' = + b^. (B.5) 


Weight and bias parameters in these two networks are found by minimizing 
the cost functions 


h = (||x' — x|| 2 ^> +P 2 ^ ( w fc U) ) > ( B - 6 ) 

k 


J 3 = (\\y' -y\\ 2 ) + P 3 J2{ w i v) y ’ (B-7) 

i 


respectively, where ||-|| 2 is the square of the L 2 -norm, with the L p -norm 
given by 


Lp(e) 



(B. 8 ) 


J 2 and J 3 thus give the MSE between the model predictions and the ob¬ 
served x and y variables subject to weight penalty terms whose magnitudes 
are controlled by the parameters P 2 and P 3 . Once the first mode has been 
extracted from the data, the next leading mode can be extracted from the 
model residuals, and so on for higher modes. 

For seasonal climate prediction tasks, where the goal is to predict values of 
a multivariate predictand dataset from a multivariate predictor dataset, e.g., 
y ' = f(x), values of the canonical variate v' must be predicted from values 
of the canonical variate u. For canonical variates normalized to unit variance 
and zero mean, the linear least-squares regression solution is given by [36]. 


v' = u cor(u, v ) 


(B.9) 
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For robust NLCCA, one needs to calculate the biweight midcorrelation: 
First rescale x and y as 


x - M x _ y - M y 
9MAD X ’ 9 _ 9MAD y ’ 


(B.10) 


where M x and M y are the median values of x and y respectively and MAD^ 
and MAD y are the median values of \x — M x \ and |y — M y \ respectively. Next, 
the sample biweight midcovariance is given by 


bicov(x, y) 


NJ2 t a(t)b(t)c(t) 2 d(t) 2 (x(t) - M x )(y(t) - M y ) 

E t 5 P m] Et 1 - 5q(t)*)] ’ 1 j 


where a(t) = 1 if —1 < p(t) < 1, otherwise a(t) = 0; b(t) = 1 if — 1 < q(t) < 1, 
otherwise b(t) = 0; c(t) = (1 — p(t) 2 ); and d(t) = (1 — q(t) 2 ). The biweight 
midcorrelation is then given by 


bicor(x, „) = biC ° V ( 1 ' t — . (B.12) 

-\/bicov(a:, x) bicov(y, y) 


The biweight midcorrelation, like the Pearson correlation, ranges from — 1 
to Tl. 

For robust NLCCA, bicor(u,u) replaces cor (u,v) in (B.3). One can also 
replace the L 2 norm with the robust L\ norm in (B.6) and (B.7), i.e. replace 
the MSE by the MAE in these cost functions. 
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Abstract. The variability of meteorological observables is known to crucially de¬ 
pend on the geographical conditions and the considered spatial as well as tem¬ 
poral scales. In this contribution, we explicitly take the spatial dimension into 
account. Recent studies on this aspect have considered individual investigations 
of spatially distributed records from different stations, which form network struc¬ 
tures with interesting statistical properties. However, the results of such studies are 
strongly influenced by the preprocessing of the time series, in the case of temperature 
records particularly by the applied deseasonalisation strategy. As a complementary 
approach, we investigate whether the interdependences between pairs of meteoro¬ 
logical records can be used to extract additional information about the regularity of 
temporal variations of the regional climate and its potential change with time. As 
an alternative to the consideration of univariate estimates of fractal dimensions, the 
concept of multivariate dimension estimates is introduced. Different quantitative 
measures for the complexity of linear correlations are introduced and thoroughly 
compared. After studying the results for stationary model systems, our approach is 
used to characterise the variability of temperature records from 13 Japanese meteo¬ 
rological stations. The complexity of the complete record varies on an annual period 
with a larger complexity during the summer season, which is possibly related to the 
action of the East Asian monsoonal circulation. 

Keywords: Temperature records, Spatio-temporal correlations, Multivariate 
dimension estimates, Variability, Japan 


1 Introduction 

The climate of the Earth is a high-dimensional complex system which is 
subjected to different global and local forcings and nonlinear internal feed- 
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back mechanisms that act on very different temporal as well as spatial scales. 
Therefore, its behaviour is chaotic [1, 2, 3, 4, 5, 6] and thus characterised by 
a strong sensitivity with respect to relatively small changes of certain envi¬ 
ronmental parameters. Such changes are known to be able to lead to sudden 
transitions in the dynamics of the entire system, with the possible breakdown 
of the North Atlantic thermo-haline circulation as the probably best studied 
example [7, 8]. Time series recording the variability of climatological observ¬ 
ables are therefore often characterised by a very strong and irregular variabil¬ 
ity and rather high levels of observational as well as “dynamical” noise. This 
holds in particular for the case of meteorological data obtained from either 
direct measurements since the start of the instrumental period or reconstruc¬ 
tions of earlier time intervals. Moreover, the variability of the corresponding 
observables in both measurements and climate models often shows properties 
like non-Gaussian probability distribution functions or long-term persistence. 
Atmospheric patterns are characterised by scales in both time and space on 
which some meteorological quantities like temperature or air pressure vary 
only weakly. If one analyses the temporal evolution of such variables at dif¬ 
ferent locations influenced by the same pattern, it is therefore likely that the 
corresponding time series are more or less strongly correlated, with a maxi¬ 
mum correlation at a time lag corresponding to the spatial distance between 
the sites and the typical velocity with which the pattern moves in space. 
Due to the dynamic evolution of the observed structures during their spatial 
motion, the strength of correlations between records decays with increasing 
distance between the considered locations. This statement holds in general for 
very different spatial as well as temporal scales: 

• On a global scale, the interrelationships between sea-level pressure records 
obtained from reanalysis data have been utilised to derive a network-like 
structure [9]. Similar features are likely to be found in simulations of cli¬ 
mate models as well. However, the behaviour of such models is known to 
differ from reanalysis data not only in terms of absolute variabilities and 
correlations, but also with respect to their non-linear features like the local 
predictability [10]. 

• On continental scales (i.e., several hundreds to thousands of kilometers), 
simple linear cross-correlation functions may (depending on the partic¬ 
ular geographic situation) not necessarily be an optimal measure for 
describing the interrelationships and exactly detecting the delay corre¬ 
sponding to the maximum dependence between meteorological time se¬ 
ries. As an alternative, one may consider different other measures which 
are sensitive to nonlinear correlations. Recent results on noisy electro- 
physiological data demonstrated that a maximisation of the linear spec¬ 
tral coherence may also be well-suited for an appropriate detection of 
delays between different signals [11]. For temperature and precipitation 
records, Rybski et al. [12] have suggested that the concept of phase 
synchronisation [13] may also be applied for detecting these time lags. 
Whereas this suggestion is underlined by inspections on model systems 
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comparing the phase synchronisation and correlation function approaches 
[14], it is likely that the concept of synchronisation analysis may lead to 
erroneous results if there is no well-defined high-frequency oscillatory com¬ 
ponent in the time-series [15, 16, 17]. 

In this chapter, we analyse long-term daily maximum, minimum, and mean 
air temperatures from different meteorological stations distributed all over 
Japan, which cover the last 30 years. Complementarily to other studies fo¬ 
cussing exclusively on the temporal characteristics of such records, we use 
the entire multivariate data set to study the temporally varying complexity 
of the spatial correlations. For this purpose, the novel concept of multivari¬ 
ate dimension estimates is introduced and thoroughly applied to our data. In 
addition, we study the mutual spatio-temporal interdependences between the 
individual records in order to identify the dynamic skeleton of the Japanese 
temperature network. According to these aims, this manuscript is organised 
as follows: In Sect. 2, we briefly summarise arguments for an appropriate 
preprocessing of the data, which in particular reflects the problem of desea- 
sonalisation. The approach of multivariate dimension estimates is introduced 
in Sect. 3. Finally, in Sect. 4, we apply different methods for uni- as well as 
multivariate assessments of the spatio-temporal correlations of air tempera¬ 
tures in Japan. Possible implications of our results for the understanding of 
nonlinear interactions between different components of the climate system are 
discussed. 


2 Preprocessing and Data Analysis 

In the case of temperature records, there is the conceptual problem that the 
relevant dynamics occur on at least two very different time scales. Besides the 
daily fluctuations on which we would like to focus in this contribution, there is 
the annual cycle that dominates the variation amplitude on longer time scales 
[18]. In order to analyse short-term correlation features of time series of air 
temperatures, it is therefore necessary to apply a sophisticated preprocessing 
of the data which removes the annual cycle component as good as possible 
and leaves the short-term variability unchanged. 

In order to separate the dynamics on different time scales, a variety of 
approaches can be found in the literature [19]. For example, in order to remove 
long-term trends in the case of temperature records, it is convenient to firstly 
apply a long-term moving average filter (with a width of > 1 year) to the 
time series which extracts such trend components. If the residual is subtracted 
from the original time series, the remaining signal (including the annual cycle 
component) remains almost invariant. 

For the problem of deseasonalisation, i.e., the removal of periodic long¬ 
term components from a time series, there are several statistical methods 
which may be rougly distinguished into the following groups: 
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• Heuristic methods: In this case, the data are considered separately for 
the respective calendar dates. For each day of the year, mean values and 
eventually also higher-order moments are computed. By standardising the 
original data for every calender day according to these statistical quanti¬ 
ties, one approaches a deseasonalised series. For example, the subtraction 
of the respective mean values for every calendar day corresponds to the 
so-called phase averaging method. It has to be noted that the heuristic 
methods do not yield a perfect deseasonalisation, as the amplitude of the 
annual component may change on both subannual and interannual time 
scales [20]. 

• Nonparametric smoothing methods: Like the heuristic methods, this type 
does not assume any functional form of the annual cycle. Two different 
subtypes may be distiguished: Methods where the exact cycle length is not 
taken into account include the traditional unweighted as well as weighted 
moving average filters, whose bandwidths have to be significantly smaller 
than the annual period. However, in this case, the temporal correlations 
may be essentially changed. As an alternative, one may also consider filters 
which explicitly refer to the length of the seasonal period [21, 22]. 

• Spectral methods: In contrast to the heuristic and non-parametric smooth¬ 
ing approaches, spectral methods implicitly assume a particular shape of 
the seasonal cycle. In the most convenient case of a high-low-pass fil¬ 
ter based on a Fourier transform, a harmonic function with a period of 
to = 365.25 days is fitted to the time series by linear regression (i.e., by 
minimising the quadratic residual between model and observations) and 
then removed from the record. However, spectral methods assume a par¬ 
ticular shape of the periodic function, which is not necessarily present in 
natural signals. As an alternative, one may use a wavelet decomposition 
of the record and reconstruct a signal by taking only the components into 
account which vary on time scales that are significantly shorter than the 
annual cycle. 

• Empirical mode decomposition (EMD): The concept of empirical mode 
decomposition has been suggested as a novel tool for time-scale separa¬ 
tion in nonstationary systems [23]. By this heuristic algorithmic approach, 
the time series is successively decomposed into components which vary 
on significantly different time scales. According to this, the corresponding 
approach might be helpful to filter annual components as well as interan¬ 
nual long-term trends from meteorological data [24, 25, 26, 27]. However, 
in the case of EMD, it is not a priori clear that the extracted long-term 
components do not contain residual information from shorter time scales 
and vice versa, such that additional testing is required before applying 
this method as a standard tool for deseasonalisation in climatological time 
series [28]. 

In this contribution, we will not perform a detailed evaluation of the 

different mentioned approaches. However, in order to briefly illustrate the 
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importance of initial deseasonalisation for the outcome of both, linear and 
nonlinear methods of time-series analysis [29], Fig. 1 shows the linear auto¬ 
correlation function 
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and the 
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Fig. 1: Standard Pearson (auto-) correlation function (left) and mutual information 
(right) of the daily maximum air temperature record from Tokyo, Japan, between 
January 1975 and December 2005. The upper and lower panels show the same func¬ 
tions on different time-scales. Different line styles correspond to the original data 
(solid, after subtraction of the yearly mean obtained with a running one-year moving- 
average filter), the residual time series after applying the phase averaging method 
(dashed), and the time series resulting from subtraction of a sinusoidal signal with 
a period of T = 365.25 days whose phase and amplitude have been estimated by a 
linear least-squares approach (dotted). 
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(where s G S = {(1),..., ( S )} is a partition of the range of the considered ob¬ 
servable X - in our case, the temperature - into S mutually disjoint classes) for 
a 31-years time series of daily maximum air temperatures recorded in Tokyo, 
Japan. For comparison, the results are displayed for the original time series 
and the deseasonalised ones after application of the phase-averaging method 
and after the subtraction of a sinusoidal signal with a period of T = 365.25 
days fitted to the data. It can be easily seen that in the original time series, 
the annual component clearly dominates the results, whereas the deseason¬ 
alised data are characterised by a successive decay of both linear and nonlinear 
correlations as the considered temporal distance between two observations in¬ 
creases. Note that there are still small residuals of the annual cycle in the 
record in the case of a narrow-banded spectral filter, which can be seen by a 
small, but significant increase of the linear correlations at delays of about 365 
days. 


3 Multivariate Dimension Analysis 

Let us now consider the question of how to quantify the complexity of interre¬ 
lationships within or between time series. Within the framework of self-similar 
processes, fractal dimensions [30] are often used to quantify the complex¬ 
ity of univariate time series, with recent generalisations to multivariate data 
[31, 32, 33, 34]. However, for real-world systems, the underlying self-similarity 
assumption is often violated, such that this approach may not be capable to 
give a sophisticated characterisation of the dynamics. In the context of uni¬ 
variate climatological time series, Broomhead and King [35] and Fraedrich [2] 
independently suggested to use the number of statistically relevant compo¬ 
nents in the suitably embedded time series as a measure for dimensionality 
(and, hence, the complexity of temporal interrelationships) in terms of the 
so-called singular system analysis [36, 37, 38]. Apart from the corresponding 
problems of defining such an appropriate embedding [39, 40, 41], the under¬ 
lying approach may be directly generalised to the case of multivariate time 
series and spatio-temporal or ensemble correlations. 

In the following, we will discuss a potential approach to assess the number 
of relevant components in multivariate time series and a possible measure 
for the strength of ensemble correlations or, more general, spatio-temporal 
interdependences. 

3.1 Statistical Decomposition of Multivariate Data Sets 

The characterisation of ensemble correlations by a single statistical parameter 
requires an appropriate statistical decomposition of the corresponding mul¬ 
tivariate time series. In principle, this decomposition can be performed by a 
variety of different approaches, including purely linear methods like Karhunen- 
Loeve decomposition (KLD) (which is often referred to as principal compo¬ 
nent analysis (PCA) or empirical orthogonal function (EOF) method) [42, 43], 
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multi-dimensional scaling (MDS) [44], or, referring to a separate consideration 
of patterns in the frequency domain, multi-channel singular system analysis 
(MSSA) [45], a combination of the “standard” singular system analysis (SSA) 
[35] with PCA. All these methods have the common concept that some matrix 
(which is suitably constructed from the observational data) is subjected to a 
singular value decomposition (SVD), i.e., is decomposed into its eigenvalues 
and the corresponding eigenvectors. In the case of KLD, one makes use of the 
correlation (or scatter) matrix of the observed data set. For MDS, a trans¬ 
formed matrix of the squared linear inter-point distances is used, whereas 
MSSA is based on a Toeplitz-type lag-covariance matrix obtained from every 
univariate component time series. 

Whereas the SVD step of all these methods may be easily and compu¬ 
tationally efficiently performed, there are also different nonlinear general¬ 
isations. One possible way to obtain such generalisations is replacing the 
Euclidean metric by one defined by the local neighborhood, e.g., in terms 
of isometric feature mapping (ISOMAP) [46] or locally linear embedding 
(LLE) [47]. An alternative is realising the decomposition in terms of neu¬ 
ral networks, including methods like nonlinear principal component analy¬ 
sis (NLPCA) [48] or independent component analysis (ICA) [49]. However, 
these nonlinear variants require a much larger amount of data for compu¬ 
tation, while the linear methods can be applied to rather short time series 
as well. In addition, the methods based on neural networks do not neces¬ 
sarily lead to well-defined component variances. It has therefore to be noted 
that the approach described in the following is not directly applicable in such 
cases. 

In the following, let us consider the Karhunen-Loeve decomposition as an 
example for which the derived components have a rather intuitive interpre¬ 
tation. As its principal idea has been introduced about 100 years ago (see, 
e.g., [43] for some historical remarks), KLD is today frequently applied as a 
standard method for compressing spatio-temporal data by finding the largest 
linear subspace that contains substantial statistical variations of the data. In 
the case of observations with N simultaneously measured variables and M 
points in time, the M x TV-dimensional data matrix A (rescaled from the 
original observations to have zero mean and unit variance in any component 
time series) is used to define the N x iV-dimensional symmetric and posi¬ 
tive semidefinite scatter matrix S = A T A. This matrix can be completely 
described by its non-negative eigenvalues of (i = 1 and their cor¬ 

responding eigenvectors (which are in geosciences usually referred to as the 
empirical orthogonal functions (EOF)). Without loss of generality, the eigen¬ 
values of of S may already be given in decreasing order of > ■ • • > a 2 N > 0. 
For our following considerations, we will use the component variances Aj, 
explained variances e^, and remaining (or residual) variances r,, which are 
defined as 
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In order to examine the dynamic features of the data, it is common to ad¬ 
ditionally study the time-dependence of the amplitudes corresponding to the 
respective EOF (if thus considered dynamically, KLD is usually referred to as 
principal component analysis (PCA)). However, this approach still exclusively 
reflects linear properties. 

3.2 Karhunen-Loeve Decomposition (KLD) Dimension 

The idea of using Karhunen-Loeve decomposition for estimating the number 
of degrees of freedom in spatially extended systems is already presented in 
[50]. Since in the case of weakly turbulent systems, the same quantity may 
be represented with methods based on fractal dimensions [51] or Lyapunov 
exponents [52, 53], this number is conveniently referred to as “the” dimension 
of the considered system. Following this line of argumentation, one may extend 
the application of Karhunen-Loeve decomposition beyond the purely linear 
point of view described above. 

To determine the number of degrees of freedom in spatially extended sys¬ 
tems, Zoldi et al. [54] introduced the concept of KLD dimension for a quanti¬ 
tative characterisation of spatio-temporal chaos [55, 56, 57]. The KLD dimen¬ 
sion may be defined as the number of eigenvalues required to capture some 
specified fraction 0 < / < 1 of the total variance YliLi A = 1 of the data, i.e., 

DkldU) = min {i : a > /} , (6) 

with the limiting cases Dkld{ 0) = 0 and Dkld{ 1) = N. The value of Dkld 
may serve as an upper bound for the true dimensionality of a system, as 
the decomposition into orthogonal components in terms of Karhunen-Loeve 
decomposition may yield particularly redundant components which might be 
reduced if a nonlinear method is applied. 

It should be noted that the above definition (6) is modified with respect to 
the original one introduced by Zoldi and co-workers who considered Dkld(I) 
being the maximum number of eigenmodes describing less than a fraction of 
/ of the total variance. This modification is motivated by the fact that for 
applications in data analysis, for a given / the minimum number of modes that 
explains a given amount of total variance is usually the quantity of interest. 
Moreover, this redefinition leads to a more “natural” behaviour of the KLD 
dimension at the limiting cases / = 0 and / = 1 as described above. 
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In the case of simulations of spatio-temporally chaotic systems (i.e., sys¬ 
tems which exhibit self-similarity over a large range of spatial scales), Zoldi 
et al. observed (for any /) a linear scaling of Dkld with the system size 
N. Whereas the KLD dimension is otherwise restricted to integer values, this 
finding suggested to study a normalised version, the KLD dimension density 
&kld = D kld /N [56], whose values are bounded to the unit interval. Note 
that in the case of systems with a typical scale, the corresponding behaviour 
of Dkld may show a scaling which is restricted to a certain range. Under 
certain conditions, there may even exist more than one distinct scaling re¬ 
gion, which corresponds to the behaviour of estimates of fractal dimensions in 
systems like the chaotic Rossler oscillator [58]. As an example from ecology, 
Wilson and Keeling [59] studied the spatial dynamics of a predator-prey- 
resource model and found a different scaling behaviour on small and large 
scales. In a similar way, such distinct scalings may be identified in images of 
ecological systems [60]. The transition between the different scaling regions 
may then be identified as the characteristic spatial (or temporal) scale of the 
system. 

The KLD dimension has mainly been used to characterise the dynam¬ 
ics of spatially extended model systems in the extensive chaotic state [54], 
spiral-defect chaos [55], and reaction-diffusion systems [56]. Recently, Varela 
et al. [57] applied Dkld for an investigation of spatiotemporal data from 
electrochemical oscillator experiments (with M > 6000 and N = 50). It has 
been demonstrated that this measure is well suited for quantifying differences 
between regular and turbulent states. 

To adapt the concept of KLD dimension for the analysis of possibly insta¬ 
tionary multivariate time series, one may additionally consider the temporal 
variability of the observations for a temporally localised characterisation of the 
dynamics. While the consideration of S for the complete data set loses any 
temporal information about the variations in the complexity of interrelation¬ 
ships between the different components (which may be significant especially if 
M N ), a separate computation of the KLD dimension for sliding windows 
in time [56] allows a resolution of the varying complexity down to the scale of 
N points in time or even below. 

3.3 Linear Variance Decay (LVD) Dimension 

Whereas the KLD dimension density can be widely applied to characterise 
large data sets from spatio-temporally chaotic systems, its direct use for the 
characterisation of observational records is problematic in the case of small 
data sets (i.e., small N ) or time windows (small M) due to different reasons: 

• 5kld has a possible range of only N + 1 different, equally spaces values. 
Thus, the number of possible values becomes very small for the considered 
data. As a consequence, small changes of the structure of interrelationships 
between the component time series are sometimes not detected by this 
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measure, whereas it changes discontinuously (with a step size of 1/7V) when 
these modifications of the data increase over a certain threshold. Thus, if 
N is rather small, only strong changes within the data are detected by a 
dramatic change of 5kld- 

• There is no natural choice of the cutoff parameter / which has to be 
specified separately for each application. Thus, it is useful not to consider 
Skld as an absolute, but rather as an relative dimension density. How¬ 
ever, for applications where only a qualitative detection and description of 
changes of the complexity of interrelationships within multivariate data is 
requested, this subtile difference is no major problem. 

• Due to the small amount of observational data in time, certain finite-size 
effects have to be expected which may cause any quantitative interpreta¬ 
tion of Skld to fail. 

The above arguments call for the definition of more general estimates for 
relative dimension densities, which can already be applied to short multivari¬ 
ate time series. As one possible approach, one may consider the scaling of 
Skld with the cutoff parameter / by fitting a suitable parametric function 
to the respective curve. In this case, one should note that for a given value 
of SxLDif) = i/N (i = 0,..., TV), 1 — / plays the role of the remaining vari¬ 
ances 7'i for i = 1,..., TV with i/N being the relative number of components 
considered. 

For the component variances A;, the scaling behaviour has been investi¬ 
gated in some detail for random matrices [61, 62] as well as real-world geosci- 
entific data [63] in terms of the logarithmic eigenvalue (LEV) curves or scree 
graphs (for an overview, see [43]). Commonly, these graphical methods are 
used as a simple possibility for graphically checking whether the component 
variances decay sufficiently smooth, which is an important prerequisite for a 
meaningful interpretation of KLD-based dimension estimates. Furthermore, 
for a certain class of multivariate random processes, Preisendorfer has derived 
analytical results on the scaling of the eigenvalues A, [43]. However, in the case 
of general multivariate data sets, only the leading eigenvalues of the covari¬ 
ance matrix are typically considered as being dynamically relevant in terms of 
statistically significant orthonormal basis vectors of a certain linear subspace. 
The remaining eigenvalues A^ are assumed to represent stochastic variations 
and, hence, to follow a distinct scaling law which depends on the length of the 
time series. Although this assumption yields a fundamental restriction to the 
analysis, it is usually not checked in applications, for example, by comparing 
the distribution of all eigenvalues to that expected for multivariate Gaussian 
white noise. 

In contrast to the component variances Aj, there are no studies analysing 
the scaling of the remaining variances r, in some detail. However, as we will 
show later for some examples, a rough inspection of the corresponding values 
for both random matrices as well as observational data shows that the decay 
corresponding to the major components (i.e., the consideration of the compo- 
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nents with the highest variances) is often reasonably well approximated by an 
exponential decay law. As a consequence, one can make the following ansatz: 

n = 1CT ^ / ' 5 for i < i max < N. (7) 

In this expression, the choice of the decadal logarithm, i.e., an explained frac¬ 
tion of 90% of the total variance of the data as a reference value, is motivated 
by the fact that a corresponding threshold yields a reasonable number for the 
effective degree of freedom in spatially extended systems, cf. [50] (93%), [54] 
(between 81% and 95%), or [64] (90%). The values of S ■ N can therefore be 
considered as an estimate of the degrees of freedom. As it quantifies the decay 
of remaining variances of the linear principal components of a multivariate 
data set, the scaling coefficient S is called the linear variance decay (LVD) 
dimension density of the considered multivariate data set [16, 65, 66 ]. 

The value of S may be roughly estimated by an ordinary linear least square 
approach to Eq. (7) [16]. However, if N is rather small, there are only few 
points to interpolate the respective model function. Moreover, there are again 
only N — 1 possible choices of the threshold i m ax for fitting this function (as 
ro = 1 and rjv = 0 by definition, an exponential decay law must be subjected 
to a certain cutoff at i m ax < N). To overcome this difficulty and define the 
model function with respect to a continuously distributed cutoff parameter 
/ (which is important when S should be considered dynamically), one can 
make use of the relationship between n and 1 — /, which is illustrated in 
panels (a) and (b) of Fig. 2 : reversing the axes in (b) and multiplying Skld 
by N, one approaches a continuously defined equivalent of the logarithmic 
representation of the remaining variances in panel (a) (where the illustrated 
function is defined only for integer values of i). A scaling law of the KLD 
dimension density corresponding to that of the remaining variances then looks 
as follows: 

Skld (<(>) = -8(f) log 10 (l - </>) for <j> £ [0, f], ( 8 ) 


Ordinary linear least squares estimate. As Skld(I) is well-defined for 
/ £ [0,1], the defining expression of the LVD dimension density expression 
allows to calculate S as a function of the maximum considered value of / for 
any / £ (0,1). As a particularly suited approach, one may apply a continuous 
least-square approach minimizing the functional 
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with respect to a (here, the transformation x = log 10 (l — <j>) has been used). 
One easily convinces oneself that F a (f) has (for any value of /) a unique 
global minimum at 
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which is easily computed by separately evaluating the integrals over all ranges 
of x where Skld(x ) has a constant value. The above expression for S uw (f) 
has already been considered as a rough estimate of the exponential decay 
scale 6(f) in [16, 65, 66 , 67]. However, in the following, we will show that the 
properties of this estimator call for a further improvement. 


Normalisation. As it follows from its definition, there is a special behaviour 
of the estimated LVD dimension density 6 UW as the maximum considered 
variance fraction / goes to 0 or 1, respectively: 6 UW —> +oo for / —> 0 as 
log 10 (l —/) —> 0, and 5 UW —> 0 for / —> 1 because log 10 (l — /) —> — oo (see Fig. 
2c). From the first observation, it follows that the unweighted estimate defined 
above is not appropriately normalised to values within [0,1]. In order to correct 
this fact for the continuous estimate, one may consider the minimally possible 
value 5™ n of the estimator 6 UW (which would occur if 8kld( f) = 1/-/V) by 
setting 
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Weighted linear least squares estimate. According to the logarithmic 
transformation from a non-linear to a linear least-squares problem, the esti¬ 
mated values of 6 UW are not only non-normalised, but also show a systematic 
bias compared to the true LVD dimension density. In order to approach more 
reliable estimates, one has to either directly use a nonlinear least-squares ap¬ 
proach or to explicitly correct the estimator by introducing a proper weight 
function. Whereas in the case of an unweighted estimate, all possible values 
of the explained variance < / contribute equally, the underlying exponential 
model implicitly requires a much higher sensitivity with respect to larger val¬ 
ues of / (i.e., low values of the remaining variance). A proper choice of the 
weight function which takes this idea into account is w(f) = (1 — f) 1 - In¬ 
troducing this factor into the integrand in Eq. (9), the exponential factors in 
the original unweighted estimate 6 UW are eliminated as w(x) = 10~ x after the 
substitution described above, i.e., 


6(f) = argmin F a (f) 
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Note that if expanding the exponential terms in the original unweighted func¬ 
tional F a (f ) into a Taylor series, the weighted estimate corresponds to the 
zeroth-order term in this expansion. 

Normalised weighted linear least squares estimate. Combining our nor¬ 
malisation with the weighted linear least squares approach, we finally end up 
with the following estimate, which is referred to as the relative LVD dimension 
density S^vd'- 

n ~8 kld{x) x dx 

'jir 1 --■ (ID 

Uogi 0 (l—/) 

Using the notation of remaining variances and i m ax '■= Dkld(/) — 1, one 
may derive the following equivalent expression: 


Note that unlike the original unweighted estimate 8 UW , the value of 
does not depend on the specific choice of a particular base for the logarithm. 
However, although its definition incorporates an appropriate weighting and 
normalisation, some conceptual problems remain in interpreting 5^y D . In par¬ 
ticular, the estimated values do still depend on the maximally explained vari¬ 
ance fraction /, which motivates the term relative dimension density. However, 
unlike for the KLD dimension density, this dependence is continuous. Despite 
this potential point of criticism, the relative LVD dimension density S^y D can 
be considered as a meaningful measure for the strength of linear interdepen¬ 
dences between the components of arbitrary multivariate time series. 

3.4 Generalisations of the LVD Dimension Density 

The formalism described above is rather general and might be adapted to 
study the eigenvalues of any symmetric matrix of interaction coefficients. In 
the following, we will give some examples for possible modifications and fields 
of application. 

If the values in the different component time series deviate strongly from 
a Gaussian distribution, the consideration of the linear Bravais-Pearson cor¬ 
relation coefficients as above may lead to biased results. To avoid the cor¬ 
responding problems, one may replace these coefficients by a nonparametric 
(rank-order) correlation coefficient like Spearman’s Rho [68]. Alternatively, 
measures of concordance (e.g., Kendall’s Tau [69]) might be considered. Both 
measures have the advantage that their values do not depend on the exact 
values of the observables and are invariant against any strongly monotonous 
transformation of the data. However, in the limit of infinitely long time se¬ 
ries, rank-order correlation and Pearson correlation converge to each other. 
Hence, the consideration of any of these correlation coefficients should yield 
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qualitatively consistent results. Using a matrix of pairwise rank-order corre¬ 
lation coefficients instead of the standard ones in our formalism then leads 
to the non-parametric linear variance decay dimension density 5nlvd- This 
measure has recently been used to study spatio-temporal interrelationships 
of river runoffs in a common catchment [67], which are a typical example of 
hydro-meteorological time series with strongly non-Gaussian distributions. 

The LVD dimension density can also be generalised using measures that 
are not exclusively sensitive with respect to linear correlations, but can also 
detect non-linear statistical dependences. As an example, one may consider 
information variance decay dimension densities Sjy D for which the linear 
correlation coefficients between the component time series are replaced by the 
respective (generalised) mutual information of order q [70]. 

To evaluate the degree of (phase) synchronisation between more than two 
interacting oscillatory subsystems, one may consider the eigenvalues of matri¬ 
ces of pairwise (phase) synchronisation indices like the mean resultant length 
[71, 72]. In contrast to other measures of multivariate (phase) synchronisation, 
this approach does not explicitly assume a spatially homogeneous synchroni¬ 
sation process, but gives additional information about the potential hetero¬ 
geneity. 

Finally, it is also possible to define optimally lagged variance decay di¬ 
mension densities Si la ^ in which the previously used equal-time interaction 
measures (correlation coefficient, mutual information, etc.) are replaced by 
the maximum values of the corresponding measures as a function of the time 
shift r between each pair of component time series. 

It has to be noted that the above list of possible generalisations is far from 
being complete and gives rise to a variety of different fields of application. 


3.5 Example 1: Multivariate Gaussian Random Processes 


As a first illustrative example, let us consider the case of multivariate Gaussian 
white noise, for which the individual components are pairwise independent 
from each other. For such a record, Preisendorfer [43] has already given explicit 
expressions for the distribution of the eigenvalues of of the covariance matrix. 
In particular, as the number of available data becomes large (i.e., M —> oo), 
the components are recognised as being pairwise independent, 30 that the 
orthogonal components identified by the Karhunen-Loeve decomposition can 
be identified with the original components of the record. As these components 
have been assumed to have unit variance, it follows that 


K 


1 

N 



(16) 


In contrast to the exponential decay model assumed by the LVD dimension 
density approach, in the case of independent Gaussian random variables, the 
residual variances thus decay linearly. In Fig. 2, it is shown that this leads to 
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Fig. 2: Results of the linear statistical decomposition and multivariate dimension 
analysis for N = 25 pairwise independent Gaussian white noise components with a 
length of M = 100 data points, (a) Logarithmic representation of the component 
variances A i (dashed) and remaining variances ri (solid), including the ±3cr signif¬ 
icance levels estimated from a set of 100 realisations, (b) KLD dimension density 
Skld in dependence of the explained variance fraction / (the scaling of the axis 
has been chosen for a better visualibity of the different values and corresponds to 
a linear scaling in —x after the substitution described in Sect. 3.3). (c,d) Different 
estimates of the “absolute” and relative LVD dimension density as described in the 
text. Note that the values differ from those given in [16] due to the choice of the 
decadal instead of the natural logarithm. 


a systematic deviation already for M = 100, which however mainly influences 
the residual variances at large component orders. For the leading orders, in 
the case of finite time series the exponential model is still a reasonable ap¬ 
proximation. However, for M = 100 this is not true anymore for the range 
of rj < 10%, where the corresponding approximation breaks down. Conse¬ 
quently, in this range, the different estimates of the LVD dimension density 
are not very reliable. In general, we therefore recommend to always discuss 
the goodness-of-fit of the exponential model in addition to the estimated de¬ 
cay scale. In particular, in order to identify signatures of stochasticity, such 
goodness-of-fit statistics may be used as corresponding parameters. 
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The finite-size corrections to the eigenvalue distributions might be used to 
theoretically derive correction terms to the scaling of the residual variances 
for finite M as well. However, this is beyond the scope of the presented work. 
With respect to real-world applications from the geosciences, an application 
to more complicated stochastic processes such as multivariate auto-regressive 
processes might also be of considerable interest [73]. We plan to study the 
corresponding questions in our future research. 

3.6 Example 2: The Politi-Witt Model Revisited 

The example of stochastic component time series with Gaussian distributions 
discussed above is rather generic. In contrast to this, many observational data 
from geoscientific systems are likely to have some deterministic, but eventually 
highly dimensional chaotic components. To demonstrate the power of our 
multivariate dimension analysis approach, we will reconsider a model system 
which approximates the behaviour of spatio-temporal chaos with a prescribed 
dimension density d £ [0,1]. The corresponding model has been originally 
introduced by [74] and was already studied by [16, 65, 66] to test for the 
applicability of multivariate dimension estimates. 

Let {Fi,..., F n } be the basis of a sufficiently high-dimensional Fourier 
space whose elements may be expressed as 



(17) 


where [■] denotes the integer part, j = 1,..., N < n gives the “spatial” posi¬ 
tion on a regular one-dimensional lattice, which is used to construct a multi¬ 
variate data set as follows: 


dn 



(18) 


Here, (where i = 1,..., M corresponds to the position in time) is a set 
of random numbers taken from an appropriate distribution. If |£i/-| < 1, the 
set of values Xij is contained in a dn-dimensional hypercube and forms a 
M x iV-dimensional data matrix. If M is sufficiently large, the eigenvalues 
A i of the associated covariance matrix (which has a Toeplitz structure) show 
an abrupt decay at the component index dn , corresponding to the dimension 
of the underlying hypercube [16, 65]. However, even under these conditions, 
the decay of the remaining variances can be approximated by our exponental 
model with a reasonable accuracy. 

Following [16, 65, 66], let the £ik be taken from a uniform distribution on 
[— 3 1 / 3 , 3 1 / 3 ]. This setting corresponds to the system originally studied by [74] 
both analytically and numerically. A detailed investigation of the dependence 
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of the unweighted estimate S uw on both the length of the time series M 
and the true dimension density d can be found in [16, 66]. In particular, 
in the aforementioned references, it has been shown that in the case of an 
appropriately chosen value of /, the KLD dimension density Skld may give a 
slightly better quantitative estimate of the true dimension d than 8lvd , while 
both characteristics converge to a asymptotically constant value if the length 
of the component time series becomes sufficiently large. However, whereas in 
the long-term limit, both types of dimension estimates give reasonable values, 
the LVD dimension density is clearly superior for detecting small changes 
within the system. The latter observation is of a particular importance for a 
possible short-term characterisation of geophysical time series in the case of 
instationary conditions. 



Fig. 3: As Fig. 2, for the Politi-Witt model with a true dimension density of d = 0.5. 


Comparing the behaviour of the different estimates with that in the case 
of multivariate random processes (see Figs. 2 and 3), one may observe that 
the qualitative behaviour is rather similar for both types of systems. How¬ 
ever, for similar values of the variance threshold /, for the Politi-Witt model 
all dimension estimates are significantly lower compared to the the case of 
multivariate noise. This lower dimensionality is obviously related to a lower 
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dynamic complexity, which is reflected by a lower number of statistically sig¬ 
nificant components. Note that both KLD dimension density and relative LVD 
dimension density do not change very much between / = 90% and 99% in the 
case of the Politi-Witt model. In particular, in Fig. 3, in can be seen that their 
values in the considered range are rather close to the “theoretical” dimension 
density of d, = 0.5. In contrast to this, in the case of a completely stochastic 
system, all estimates do significantly increase within this range of / and show 
only a very slow convergence towards the values near 1 as / becomes large. 

4 Correlations of Japanese Temperature Records 

In the following, we will study the spatio-temporal correlations between air 
temperature records across the Japanese islands. A sketch of the Japanese 
archipel is shown in Fig. 4, including the approximate locations of the 13 me¬ 
teorological stations which we will use in our analysis. The data contain daily 
minimum, maximum, and mean air temperatures for the time interval be¬ 
tween 1975 and 2005 provided by the Japanese Meteorological Agency. It has 
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Fig. 4: Map on the Japanese archipel, including the approximate locations of the 13 
meteorological stations used in this study. 



Spatio-Temporal Correlations in Japanese Air Temperature Records 143 


to be noted that the covered area is rather large and includes the small south¬ 
ern islands with a tropical climate, whereas the northernmost main island 
Hokkaido is characterised by mild conditions during the summer, but rather 
cold temperatures during the winter season. The regional climate system is 
strongly influenced by the East Asian monsoonal circulation, which roughly 
means a very high humidity in summer and significantly dryer conditions in 
winter. 

According to our results from Sect. 2, a sophisticated preprocessing of 
all time series is necessary before any further analysis. As our corresponding 
analysis revealed that the linear statistical features do not depend significantly 
on the particular deseasonalisation approach, in the following, we will consider 
all time series to be subjected to the phase averaging method after a removal 
of long-term trends extracted by a one-year running moving average filter. 
Moreover, we standardise all data afterwards in the usual way to have zero 
means and unit variances. 

As a first step of our analysis, let us consider the linear (Bravais-Pearson) 
equal-time cross-correlation coefficients 


M 



(19) 


between all pairs (X,Y) of records in our data set. In addition, we define the 
correlation distances dxy = 1 — Cxy between all stations, which are then used 
for a one-dimensional agglomerative cluster analysis with the single-linkage 
method. Our corresponding results are shown in Fig. 5. It turns out that there 
are three major groups of stations, whose mutual correlations are most pro¬ 
nounced in the case of daily maximum and mean temperatures. These three 
groups can easily be attributed to specific geographical regions: Hokkaido 
and Northern Honshu (stations 1-4), Western Honshu/Kyushu/Shikoku (7-9), 
and the Southwestern Islands (10-12). In addition, there are the special cases 
of Tokyo, Mount Fuji, and Minamitorishima (also known as Marcus’ Island) 
which show more specific temperature variations that can be explained by 
their special geographical features (metropolis region, mountain, small iso¬ 
lated island). In general, the observed correlations in the mean temperature 
records are significantly stronger than those between the respective maximum 
or minimum temperatures. With respect to the reported geographical clusters, 
the daily minimum values reveal a less pronounced structure, as in particular 
the records from the smaller southwestern islands are much less correlated 
with each other than the corresponding mean or maximum temperatures. 

As already stated in the introduction, the spatial extention of the studied 
area leads to the fact that simple climatological patterns influence different 
locations at different times. Consequently, the study of equal-time correlations 
as above does not necessarily yield an optimum representation of the mutual 
correlation pattern of our records. As an alternative, we consider the maximum 
values of the cross-correlation functions 
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Fig. 5: Map of the mutual equal-time correlation coefficients (left panels) and the 
resulting dendrograms of a one-dimensional agglomerative cluster analysis based on 
the correlation distance (right panels) for the daily maximum, minimum, and mean 
temperatures (from top to bottom) recorded at 13 Japanese stations (see Fig. 4) 
between January 1975 and December 2005. In the dendrograms, the identified clus¬ 
ters are visualised by thicker lines, which correspond to larger mutual correlation 
coefficients. The station 13 (Minamitorishima) is only very weakly correlated with 
the other locations and therefore not shown in the correlation maps. 
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M 

CxF(r) = ^X J+T F i (20) 

i=l 

that may occur for r yf 0. For this purpose, we use the complete information 
contained in these functions and search for the maximum values by approx¬ 
imating the available values (for integer numbers of the respective delay r 
in a basic unit of one day) by applying cubic spline interpolation. Indeed, 
the resulting correlation patterns are qualitatively consistent with those of 
the equal-time correlations, but include additional information in terms of 
the “optimal” delay between each pair of stations (see Fig. 6). 

In order to give a condensed view on the correlation pattern of the Japanese 
climatology, we suggest to combine the information about the maximum cor¬ 
relation and the corresponding temporal delay into one graphical representa¬ 
tion. For this purpose, we propose a visualisation in terms of a “correlation 
network” which is shown in Fig. 7. In this representation, links between two 
“nodes” (here: meteorological stations) located in a two-dimensional plane 
are shown in terms of three-dimensional arrows whose colors and heights give 
information about the minimum correlation distances dxy (obtained from 
the maxima of the cross-correlation functions) and the corresponding optimal 
delays r, respectively. This network representation is inspired from the three- 
dimensional visualisation of airborne traffic networks as well as the “climate 
networks” recently studied by Tsonis and co-workers [9, 75, 76]. In our case, 
a detailed inspection shows that there is a strong coincidence between high 
correlations and low delays, which is a characteristic feature of records from 
stations with a relatively small spatial distance, i.e., the geographical clusters 
already identified above. 

Our presented approach can be easily generalised to other measures of 
interdependences, including non-parametric rank-order correlation functions 
based on Spearman’s Rho or Kendall’s Tau, nonlinear mutual information, or 
generalised correlation functions obtained from recurrence plots. For a review 
of these approaches with applications to hydrological records (river runoffs 
from different gauges in a common catchment), we refer to Ref. [67]. In this 
work, we will further focus exclusively on the linear correlation properties. 

Whereas up to this point, we have restricted our interest to the matrix of 
mutual correlations between the temperature records from different stations, 
in the following we will go one step further. In particular, we will consider the 
entire multivariate data set as a whole and investigate its statistical properties 
by means of the multivariate dimension estimates introduced in Sect. 3. In or¬ 
der to distinguish this approach from the consideration of mutual correlations, 
we will refer to it to as ensemble correlations [67]. 

Figure 8 shows the eigenvalues and remaining variances of the covariance 
matrices for daily maximum and minimum temperatures. It can be seen that 
for both observables, the decay of the residual variance is reasonably well ap¬ 
proximated by an exponential function (even better than in case of the two 
model systems discussed in the previous section). In addition, the resulting 
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Fig. 6: Map of the maximum values of the cross-correlation functions (left panels) 
and the corresponding optimal mutual delays (in days) of the considered records 
(right panels) for the daily maximum, minimum, and mean temperatures (from top 
to bottom) recorded at 12 Japanese stations (without Minamitorishima) between 
January 1975 and December 2005. 


KLD and relative LVD dimension densities are shown as a function of the ex¬ 
plained variance fraction /. Considering the resulting values for typical choices 
of / of about 0.9 to 0.95, the estimated dimension densities are between the 
values of the two model systems. This indicates that although the mutual cor¬ 
relations between the individual records are rather strong, there are residual 
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1 . 5-1 
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Fig. 7: Network representation obtained from the Japanese daily maximum tempera¬ 
ture records from 12 meteorological stations (without Minamitorishima). The color 
of the arrows corresponds to the minimum correlation distance (or, alternatively, 
the maximum of the cross-correlation function), whereas the height represents the 
mutual delay between two stations. Note that in this representation, there is no 
information about the direction of this delay. 


fluctuations that resemble stochastic components rather than signatures of 
complex deterministic behaviour. 

In Fig. 9, we have shown the resulting values of 5 r ^y D calculated for sliding 
windows of 14 or 28 days width, respectively. One may clearly observe that the 
estimated dimensionality of the data varies with an annual period, although 
the annual cycle components have been filtered out by previous cleseasonal- 
isation. This observation suggests that the spatial correlations between the 
different locations are different during different seasons, relating to the large- 
scale atmospheric circulation patterns influencing Eastern Asia. In particular, 
the summer conditions are characterised by a significantly larger dimension¬ 
ality, i.e., a larger number of dynamically significant patterns, which possibly 
relates to a common influence of the monsoon. However, one has to mention 
that our analysis does not yet provide enough evidence for a corresponding 
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Fig. 8: Left panels'. Normalised eigenvalues Ai ( dashed ) and residual variances 
(solid) of the covariance matrices of the 31-years records of daily maximum (top) and 
minimum (bottom) temperatures from 13 Japanese meteorological stations. Right 
panels : The corresponding values of the KLD (solid) and relative LVD (dotted) di¬ 
mension density in dependence on the explained variance fraction /. 


conclusion. In order to present a more detailed explanation, a deeper analysis 
of the corresponding spatial patterns is necessary in terms of the spatial EOFs 
obtained from time series from a much larger set of meteorological stations. 

In order to study the temporal evolution of the annual component of the 
variations in the dimensionality of our data, a continuous wavelet analysis 
has been performed. The results shown in Fig. 9 reveal that the strength of 
the “locking” to the annual cycle of insolation indeed significantly varies over 
the entire length of the record, with a maximum coherence in 1987/88 and a 
minimum coherence around winter 1993/94. We have tried to correlate this 
locking to different possible influences like the El Nino Southern Oscillation 
or the solar activity cycle. However, no substantial indication has been found 
so far that any of these phenomena causes the quantitative variations of the 
observed locking to the annual period in a direct or indirect way. 
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Fig. 9: Left panels: Relative LVD dimension density <5£y 0 (for / = 0.95) ob¬ 
tained from maximum air temperature time series of 12 Japanese stations (without 
Minamitorishima) calculated for running windows of 14 (upper panels) and 28 days 
(lower panels) width, respectively, for the time interval between January 1975 and 
December 2005. Right panels: Corresponding wavelet spectrograms, estimated with 
a complex Morlet wavelet. The white lines indicate the respective cones of influence. 


5 Summary and Outlook 

In this work, we have presented some conceptual ideas for the investigation 
of correlations between spatially distributed climatological time series. As an 
example, we have studied a set of daily temperature records from different 
Japanese meteorological stations. On the one hand, the mutual correlations 
between all pairs of stations have been considered. As a novel point of view, 
we have introduced the ideas of climatological correlation networks and cor¬ 
relation cluster analysis, which allow to derive qualitative as well as quanti¬ 
tative statements about the significance of such mutual interdependences. On 
the other hand, we have theoretically developed a new class of multivariate 
dimension estimates, which can be used to characterise the complexity of in¬ 
terrelationships between the component time series in multivariate data sets. 
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In particular, our new estimates are well suited for a dynamic characterisation 
of temporally varying ensemble correlations. With respect to our considered 
example, our approach allows a unified assessment of the nonlinear dynamics 
of a regional climate system. 

One has to mention that our presented analyses have been based on lin¬ 
ear tools (correlation analysis, principal component analysis), although these 
concepts have been extended using ideas from nonlinear dynamics. This pro¬ 
cedure is apparently in conflict with the inherently nonlinear nature of the 
climate system. Hence, the statistical uncertainty of correlations triggered by 
non-normality, sampling errors, and other features of the data may cause prob¬ 
lems in quantitatively interpreting our results. In future studies, it is therefore 
necessary to explicitly take potential errors in the estimated correlations into 
account, in particular, by statistically quantifying them in terms of confidence 
intervals. 

As a potential disadvantage of the concept of multivariate dimension es¬ 
timates, one has to mention that this integrated view on the complexity of 
mutual correlations loses information about the detailed spatial variability 
patterns. In order to compensate this, we would like to mention only two 
possible approaches: (i) a combination with an analysis of the empirical or¬ 
thogonal functions (EOF) resulting from principal component analysis or an 
alternative statistical decomposition, and (ii) the generalisation of the pre¬ 
sented concept to univariate dimension estimates. For the latter purpose, it 
is possible to use properly embedded univariate time series whose eigenvalues 
are then separately quantified with our approach (i.e., the KLD step in our 
analysis is substituted by singular system analysis). A corresponding approach 
may be used to quantify the spatial variations of the complexity of meteoro¬ 
logical records, which is usually done in terms of fractal theory by calculating 
similarity or correlation dimensions. However, the use of SSA-based estimates 
is by far less demanding with respect to the required amount of data. 

In order to further validate our results, additional information from a 
larger set of stations with a larger time coverage is necessary. If such data 
become available, it will be of particular interest to extend our approach of 
climatological correlation networks to a detailed investigation of the statistical 
properties of such networks. Similar studies have been recently performed by 
Tsonis and co-workers [9, 75, 76] based on reanalysis data and variations of 
distinct climatological oscillation indices. However, as several recent studies 
suggest that the nonlinear properties of direct observations, reanalysis data, 
and climate models may differ from each other, it might be of considerable 
interest to compare not only the “traditional” linear and non-linear measures, 
but also the characteristics of the resulting networks. 
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Abstract. The characterisation and quantification of long-term sea-level variability 
is of considerable interest in a climate change context. Long time series from coastal 
tide gauges are particularly appropriate for this purpose. Long-term variability in 
tide gauge records is usually expressed through the linear slope resulting from the 
fit of a linear model to the time series, thus assuming that the generating process is 
deterministic with a short memory component. However, this assumption needs to 
be tested, since trend features can also be due to non-deterministic processes such as 
random walk or long range dependent processes, or even be driven by a combination 
of deterministic and stochastic processes. Specific methodology is therefore required 
to distinguish between a deterministic trend and stochastically-driven trend-like fea¬ 
tures in a time series. In this chapter, long-term sea-level variability is characterised 
through the application of (i) parametric statistical tests for stationarity, (ii) wavelet 
analysis for assessing scaling features, and (iii) generalised least squares for estimat¬ 
ing deterministic trends. The results presented here for long tide gauge records in the 
North Atlantic show, despite some local coherency, profound differences in terms of 
the low frequency structure of these sea-level time series. These differences suggest 
that the long-term variations are reflecting mainly local/regional phenomena. 

Keywords: Sea-level, Stationarity, Scaling, Trend assessment 


1 Introduction 

Sea-level is a fundamental geophysical parameter that is relevant for many 
geosciences sub-disciplines including geodesy, oceanography, marine biogeo¬ 
sciences and climatology. Sea-level, or the height of the sea surface above a 
reference level, is measured by tide gauges at coastal sites and through radar 
altimeters on-board satellite platforms. 

Tide gauges are the only historical source of precise sea-level measure¬ 
ments, some dating back to the 19th century. Tide gauge data have been 
traditionally used for navigational purposes, for the prediction of tides at a 
given location, and in the definition of levelling systems [1]. The increasing 
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interest in environmental issues and climate change in the second half of the 
20th century lead to a renewed interest and new applications of tide gauge 
measurements, including the study of currents, storm surges and sea-level 
extremes, and the estimation of sea-level change (e.g. [2, 3, 4]). 

Satellite altimetry [5] has the enormous advantage of being able to mea¬ 
sure sea-level at a global scale, yielding a uniform space-time dataset of sea 
surface heights. However, the conversion of the radar measurements into a 
estimate of the height of the sea surface involves a large number of steps and 
therefore a considerable number of potential error sources, including errors 
in the geophysical corrections applied to the satellite measurements, errors in 
recovering the orbit of the satellite, and instrumental drifts affecting the sta¬ 
bility of the altimeter [6, 7]. Furthermore, high quality and continuous satellite 
altimetry measurements are only available since 1993, hindering the analysis 
of long-term variability from satellite time series. 

Sea-level is considered a key indicator of climate change and an impor¬ 
tant observational constraint for global climate models [8]. From a climate 
change perspective, the quantification of long-term sea-level variability is of 
paramount importance. According to the 4th assessment report of the IPCC 
[9], there is high confidence that the rate of sea-level rise has increased between 
the mid-19th and mid-20th centuries. For the 1993-2003 period, the rate of 
sea-level rise derived from satellite altimetry is significantly higher than the 
average rate, but not unprecedented, as concluded from inspection of the tide 
gauge record [10, 11]. Thus, it is unknown whether the higher rate from 1993 
to 2003 is due to decadal variability or an increase in the longer-term trend. 

In nearly all geosciences problems, the interaction between mathematical/ 
statistical methodology and application-specific knowledge is vital for scien¬ 
tific advancement. However, a fruitful interplay between geosciences (physical) 
and time series analysis (statistical) perspectives is seldom easy to achieve, as 
emphasised in the still pertinent insight of Sir Gilbert Walker in 1927 11 There 
is, today, always a risk that specialists in two subjects, using languages full 
of words that are unintelligible without study, will grow up not only, without 
knowledge of each others work, but also will ignore the problems which require 
mutual assistance'" [12]. Although it is not straightforward, narrowing the gap 
between the physical and statistical perspectives is essential for the charac¬ 
terisation of long-term sea-level variability. From a geoscientific point of view, 
the main question is whether sea-level is rising or falling at a specific site, or 
in a given area, or globally. This question is often translated into the goal of 
determining long-term variability or even more often the “trend” in sea-level. 
However, from a time series analysis point of view, although a trend seems 
to be a feature that can be easily recognised in a time series plot, it lacks a 
precise, rigorous definition. A caricatured definition of trend is given by [13]: 
“a trend is a trend is a trend ...”. In much of time series literature, trend is 
conceived as that part of a series which changes relatively slowly over time, 
and loosely defined as “long-term change in the mean level”; what is meant 
by long-term involves a subjective assessment, and different authors use the 
term trend in different ways (e.g. [14, 15]). 
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An alternative to the subjective notion of trend in time series analysis is 
to consider the somewhat opposite concept of stationarity. Stationarity is not 
only a mathematically well defined property of a time series, but actually a 
very fundamental one, since most methods of time series analysis are based 
on the theory of stationary stochastic processes. The geoscientific question of 
determining the trend in sea-level time series can therefore be translated into 
a first statistical question of whether the sea-level time series are stationary 
(no trend). 

For a time series that cannot be considered stationary, the next obvi¬ 
ous statistical question concerns the type of nonstationarity. Characterising 
a time series as nonstationary does not translate directly into having a de¬ 
terministic (often linear) trend, although this notion is still quite common 
in geosciences, and in particular in sea-level research. In fact, many different 
processes, including deterministic, random walk, and long range dependent 
processes, can engender trend or trend-like features in a time series. Specific 
methodology is therefore required to distinguish between a deterministic trend 
and stochastically-driven trend-like features in a time series. 

In most geosciences problems, the need to quantify long-term variability 
prompts the computation of linear trends trough ordinary linear regression. 
However, from a time series analysis point of view, when considering the 
estimation of a deterministic trend the time series character of the data needs 
to be taken into account in the regression framework. Autocorrelation is an 
ubiquitous feature in most geosciences time series, and needs therefore to be 
appropriately included in the estimation procedure. 

In this chapter, the characterisation of long-term variability in sea-level 
is addressed from a time series analysis perspective. The methodology is de¬ 
scribed in Sect. 2. Time series of sea-level heights are first tested for station¬ 
arity through parametric statistical tests (Sect. 2.1). The scaling properties 
of the series are then examined in the wavelet domain (Sect. 2.2) in order to 
assess persistent or long-memory features. The estimation of linear trends in 
a time series context is considered in Sect. 2.3. Results on the characteristics 
of North Atlantic long-term sea-level variability are presented in Sect. 3 and 
discussed in Sect. 4. 


2 Characterisation of Long-Term Variability 

Long-term variability in sea-level records is often expressed through the lin¬ 
ear slope resulting from the fit of a (deterministic) linear trend model to the 
sea-level time series (e.g. [16, 17]). Then, it is assumed that the process gener¬ 
ating the sea level time series is deterministic with a short memory stochastic 
component. This assumption needs to be tested since trend features can also 
be due to non-deterministic processes such as random walk or long-range de¬ 
pendent processes, or even be driven by a combination of deterministic and 
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stochastic processes. The application of parametric statistical tests for dis¬ 
criminating between stationarity (no trend), a deterministic linear trend and 
a stochastic trend in the form of a random walk is addressed in Sect. 2.1. 
However, the discrimination between a deterministic trend and a stochastic 
alternative exhibiting significant low frequency variability such as long range 
dependence can be particularly challenging in the time domain and is more 
easily handled in the wavelet domain. The wavelet spectrum is blind to deter¬ 
ministic trends and is particularly useful for assessing the scaling features of a 
time series (e.g. [18, 19]), complementing the parametric statistical tests. Such 
wavelet-based approach is considered in Sect. 2.2. Finally, if the parametric 
tests and the wavelet analysis indicate that the assumption of a deterministic 
trend is plausible, the estimation procedure needs to take into account the 
time series nature of the data (Sect. 2.3). 

2.1 Stationarity 

The concept of stationarity plays a key role in time series analysis and is a basic 
assumption of most time series models. Nevertheless, most geosciences time 
series are non-stationary. One of the most common approaches to assess non- 
stationarity in the form of monotonic trends is the rank-based Mann-Kendall 
non-parametric test for a random process null hypothesis against a monotonic 
alternative (e.g. [20]). However, this test is not robust to autocorrelation, 
and serial correlation induces the identification of spurious trends. Parametric 
statistical tests of stationarity taking serial correlation into account have been 
developed mainly in econometrics, in order to discriminate between wide sense 
stationarity (no trend), deterministic trends plus stationary stochastic noise, 
and non-stationarity in the form of a unit root (including random walk). 

Standard parametric tests for stationarity such as the Dickey-Fuller (DF) 
test [21], the augmented Dickey-Fuller (ADF) test [22] or the Phillips-Perron 
(PP) test [23] have been designed to test the null hypothesis of a random walk 
against a stationary alternative. The PP test has the advantage over the clas¬ 
sical DF and ADF tests of handling serial correlation and heteroscedasticity 
directly in the test statistic. The Phillips-Perron (PP) test is based on the 
model 

X t = r/ + f3t + nX t _i + ip t (1) 

with the unit root null hypothesis expressed by Hq : 7r = 1; the station¬ 
ary process is not assumed to be white noise and serial correlation and 
heteroscedasticity in the V’t term are handled directly in the test statistic. 

The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test [24] complements the 
previous tests by testing a stationary null hypothesis in the form of a constant 
level or a deterministic trend. The test assumes that the time series can be 
decomposed into the sum of a deterministic trend, a random walk (r t ) and a 
stationary stochastic noise (i/ t ): 
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X t = fit+ r t + v t r t = r t _i + e t v t ~ Af(0, cr 2 ) e t ~ AT(0, <r 2 ), (2) 


allowing to test a level-stationary hypothesis 


H 0 : a 2 = 0 

(P = 

0) 

Hi : a e V 0 

(3) 

and a trend-stationary hypothesis 





H 0 : crl = 0 

CM 

0) 

Hi : a 2 ^ 0. 

(4) 


Statistical tests assuming a random walk null hypothesis and tests assum¬ 
ing a stationary null hypothesis are complementary and therefore their joint 
application is recommended. In this work, the PP test and the KPSS test 
are jointly applied for testing stationarity of a sea-level time series. If both 
tests reject the null hypothesis then alternative parametrisations such as long 
range dependence should be considered. If both tests fail to reject the null 
hypothesis, then the time series (or the tests) are not sufficiently informative 
for discriminating the kind of stationary behaviour. Rejection of the unit root 
hypothesis in the PP test and no rejection of KPSS’s test null hypothesis 
points to a deterministic trend, while no rejection of the unit root null hy¬ 
pothesis in the PP test and rejection of KPSS’s null hypothesis indicates a 
unit root process. 


2.2 Scaling 

Long range dependence (or long-memory) was first noted in hydrology from 
the study of the water levels of the Nile river as the tendency for a flood year 
to be followed by another flood year [25, 26]. Long range dependence is one 
of the most important manifestations of scale invariance. A process exhibits 
scale invariance if its spectral density function S is a power law for frequencies 
approaching zero: 

lim f ->oS(f) = Cf a (5) 

where C > 0 and a are constants. The value of the scaling exponent a 
defines not only long memory but also other kinds of scaling behaviour 
(Table 1). Thus, the estimation of the scaling exponent of a time series 
provides an alternative and complementary way of characterising its low- 
frequency structure. 

The discrete wavelet transform is a natural tool for scaling processes, since 
the wavelet spectrum (corresponding to the variance of the wavelet coefficients 
as a function of scale) provides a summary of the spectral density function, 
reproducing in the wavelet domain the power laws underlying the scaling 
processes [18, 19]. Furthermore, the discrete wavelet transform is insensitive 
to deterministic features and acts as a decorrelating transform, converting long 
range dependence in the time domain into short range statistical dependence 
in the wavelet domain, rendering its application to the analysis of long range 
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Table 1: Values of the scaling exponent a for different stochastic processes (adapted 
from [18], p. 286). 


a 

Process 

a = 0 

white noise 

a > 0 

short range stationary 

— 1 < a < 0 

long memory 

a = -2 

random walk 

a = —1 

1 /f or flicker noise 


dependence particularly appealing [18]. In this study, the scaling exponent a 
is estimated from the slope of the wavelet spectrum as described in [27]. The 
estimation of the scaling exponent from the wavelet spectrum rather than 
from the direct Fourier spectrum is particularly advantageous in the case of 
nonstationarity (e.g.[28, 29]). Furthermore, estimates based on the wavelet 
spectrum are more robust to the presence of trends and periodicties [30]. 

2.3 Trend Estimation 

Long-term sea-level variability is usually quantified through a deterministic 
linear model, that can be written in matrix form as 

y = X(3 + e e ~ 1V(0, E) (6) 

where y is a length-n vector of sea level observations, X is a n x 2 matrix 
(X = [It] and t denotes time), f3 is the vector of parameters and e is 
a length-n vector of errors with symmetric and positive definite covariance 
matrix E. 

This linear model is commonly fitted to a sea-level time series by ordinary 
least squares (OLS). The regression model underlying ordinary least squares 
(denoted OLS model) is the linear model (6) with uncorrelated errors, i.e diag¬ 
onal covariance matrix E = a 2 1. Then the trend estimator Pols given by 

Pols = ( X T X)~ 1 X T y (7) 

with variance 

V[Pols\ = a\X T X)~\ (8) 

In the OLS model the observations are assumed to be independent, but 
this assumption is not valid, in general, for a time series. For a non-diagonal 
covariance matrix, E ^ er 2 /, the estimator of V\/3 OLS \ (8) is biased and in¬ 
consistent, affecting statistical significance and the estimated standard errors. 

The effect of serial correlation on regression is a well known problem. One 
of the earliest approaches for handling serial correlation in time series re¬ 
gression was through transformations to the ordinary least squares estimator, 
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such as the Cochrane-Orcutt method [31]. Another approach to deal with 
serial correlation is to consider an effective sample size: the number of de¬ 
grees of freedom is reduced by considering instead of the original series length 
an effective sample size computed from e.g. lag-1 auto-correlation. This ap¬ 
proach was used in the estimation of the linear trend in Key West tide gauge 
record by [16] and to account for serial correlation in altimetry time series 
(e.g. [32]). Still within the ordinary least squares framework, corrections to 
the mean and variance of the estimates can be derived under an assumed 
autocorrelation structure in order to correct for serial correlation [6, 33]. 

Generalised least squares (GLS) is a more general approach for the esti¬ 
mation of a linear trend. The regression model underlying generalised least 
squares (denoted GLS model) is the linear model (6) with correlated errors, 
i.e. non-diagonal covariance matrix £ ^ a 2 I. The trend estimator f3 GLS is 
then given by 


3 GLS = (X T £- 1 X)~ 1 X T £- 1 y (9) 

with variance 

V0 gls\ = (X t £- 1 X)~ 1 . (10) 

Since estimation of the covariance matrix £ requires n(n + 1)/2 parameters, 
£ cannot be obtained from a sample of size n and restrictive parametrisations 
must be assumed, through specification of a stationary process for the error 
correlation structure. In this work, a set of four different stationary processes 
is considered in the specification of A: a first order autoregressive process, a 
second order autoregressive process, a first order moving average process and a 
first order autoregressive/moving average process. Although the restriction to 
these low order models is limiting, simulation studies show that differences in 
estimation efficiency between correct and misspecified correlation structures 
are small, suggesting that there may not much to be gained in trying very high 
order parametrisations for the errors correlation structure [34]. For each time 
series the model for the error correlation structure is selected from this set 
of models using the Akaike Information Criterion (AIC). Numerical maximi¬ 
sation of log-likelihood allows the simultaneous estimation of both regression 
coefficients and parameters of the error covariance process. 


3 Long-Term Variability of North Atlantic Sea-Level 

3.1 Data 

Sea-level time series from sixteen tide gauge stations in the North Atlantic 
with long (> 50 years) and continuous records (gaps < 1 year and missing 
values < 2.5%) are analysed (Fig. 1, Table 2). Although longer time series are 
available for some of the records (Brest, Halifax) shorter periods have been se¬ 
lected for analysis in order to avoid large gaps in the time series and maintain 
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Fig. 1: Map of tide gauge locations: 1-Newlyn; 2-Brest; 3-Coruna; 4-Vigo; 5-Halifax; 
6-Portland; 7-Boston; 8-Newport; 9-New York; 10-Baltimore; 11-Kiptopeke; 
12-Hampton; 13-Charleston; 14-Fort Pulaski; 15-Mayport; 16-Key West. 

coherency with criteria for missing observations. Monthly time series are 
obtained from the Permanent Service for Mean Sea Level (PSMSL) database 
[35] of Revised Local Reference (RLR) tide gauge measurements. Seasonality 
is handled by subtracting from each observation the average value for the cor¬ 
responding month. Missing observations are filled-in by linear interpolation. 


Table 2: Analysed tide gauge records. 



Longitude 

(E) 

Latitude 

(N) 

Period 

No. observations 

% missing 

Newlyn 

-05.55 

50.10 

1916-2003 

1056 

0.19 

Brest 

-04.50 

48.38 

1953-2000 

576 

0.3 

Coruna 

-08.40 

43.37 

1944-2001 

696 

1.6 

Vigo 

-08.73 

42.23 

1944-2001 

696 

0.86 

Halifax 

-63.58 

44.67 

1920-2002 

996 

1.30 

Portland 

-70.25 

43.67 

1912-2003 

1104 

0.27 

Boston 

-71.05 

42.35 

1921-2003 

996 

0.80 

Newport 

-71.33 

41.50 

1931-2003 

876 

1.26 

New York 

-74.02 

40.70 

1927-2003 

924 

1.08 

Baltimore 

-76.58 

39.37 

1903-2003 

1212 

0.16 

Kiptopeke 

-75.98 

37.17 

1952-2003 

624 

0.80 

Hampton 

-76.33 

36.95 

1928-2003 

912 

0 

Charleston 

-79.93 

32.78 

1922-2003 

984 

0 

Fort Pulaski 

-80.90 

32.03 

1935-2003 

828 

1.21 

Mayport 

-81.43 

30.40 

1929-2000 

864 

0.35 

Key West 

-81.80 

24.55 

1913-2003 

1092 

0.73 
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Seasonally-adjusted sea-level records exhibit for most stations a seemingly 
increasing trend (Fig. 2). Stationarity is tested in the time domain in Sect. 3.2 
and scaling features of the series are analysed in the wavelet domain in 
Sect. 3.3. Linear trends are estimated in Sect. 3.4. 

3.2 Stationarity Tests 

Stationarity of sea level time series is examined through PP and KPSS statisti¬ 
cal tests. The results for the two tests, in terms of the corresponding p-values, 
are given in Table 3. 

For all records the PP test rejects the unit root null hypothesis (p-value 
< 0.05) indicating that a stochastic trend from an underlying random walk 
process can be discarded. For Newlyn, Coruna, Vigo, Kiptopeke, Hampton 
and Fort Pulaski the KPSS null hypothesis is not rejected indicating a deter¬ 
ministic trend, but long range dependence, often present in sea-level records 
[36, 37], cannot be excluded. For the remaining records, the KPSS test leads 
to the rejection of trend stationarity, indicating that these time series are 
not well represented either by a stationary, random walk or trend-stationary 
process and therefore that an alternative parametrisation (such as long range 
dependence) needs to be considered. 

3.3 Scaling Exponent 

Sea-level is known to exhibit scale-invariance over a wide range of frequen¬ 
cies [37]. In order to examine the scaling properties of the sea-level series, a 


Table 3: The p-values from KPSS (Hq: deterministic trend) and PP ( Hq : random 
walk) statistical tests. 



KPSS test 

PP test 

Newlyn 

> 0.1 

< 0.01 

Brest 

< 0.01 

< 0.01 

Coruna 

> 0.1 

< 0.01 

Vigo 

0.067 

< 0.01 

Halifax 

< 0.01 

< 0.01 

Portland 

< 0.01 

< 0.01 

Boston 

< 0.01 

< 0.01 

Newport 

0.023 

< 0.01 

New York 

< 0.01 

< 0.01 

Baltimore 

< 0.01 

< 0.01 

Kiptopeke 

> 0.1 

< 0.01 

Hampton 

> 0.1 

< 0.01 

Charleston 

< 0.01 

< 0.01 

Fort Pulaski 

> 0.1 

< 0.01 

Mayport 

0.012 

< 0.01 

Key West 

0.05 

< 0.01 





RSLH (mm) RSLH (mm) RSLH (mm) RSLH (mm) RSLH (mm) RSLH (mm) RSLH (mm) RSLH (mm) 
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Fig. 2: Monthly (seasonally-adjusted) time series of relative sea level heights (RSLH). 
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wavelet analysis based on the maximal overlap version of the discrete wavelet 
transform [38] is carried out using a Daubechies wavelet filter of length L = 4 
[39]. Brick-wall boundary conditions are applied for unbiased estimates [40]. 
The wavelet spectrum is constructed by representing on a loglO-loglO graph 
the wavelet variance estimated from the resulting wavelet coefficients [41] ver¬ 
sus scale, along with the corresponding 95% confidence intervals (Fig. 3). An 
alternative wavelet-based estimation of the scaling exponent would consist in 
considering the continuous rather than the discrete wavelet transform in order 
to have a larger number of scales over which to estimate the scaling exponent 
(e.g. [42]). 

For most records the wavelet spectrum exhibits a linear behaviour within 
some scale range. The slope of the wavelet spectrum is estimated by a weighted 
least squares estimator that takes into account the large sample properties 
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Fig. 3: Wavelet spectrum for each tide gauge record. 
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Table 4: Estimates of the scaling exponent a and associated standard error (s.e.). 



a 

s.e. 

Bootstrap s.e 

Newlyn 

-0.34 

0.062 

0.014 

Brest 

-0.41 

0.088 

0.023 

Coruna 

-0.62 

0.079 

0.051 

Vigo 

-0.57 

0.079 

0.086 

Halifax 

-0.28 

0.064 

0.089 

Portland 

-0.72 

0.061 

0.160 

Boston 

-0.64 

0.064 

0.020 

Newport 

-0.70 

0.069 

0.035 

New York 

-0.68 

0.067 

0.130 

Baltimore 

-0.55 

0.053 

0.041 

Kiptopeke 

-0.67 

0.084 

0.087 

Hampton 

-0.63 

0.067 

0.055 

Charleston 

-0.59 

0.064 

0.031 

Fort Pulaski 

-0.53 

0.071 

0.025 

Mayport 

-0.60 

0.070 

0.030 

Key West 

-0.54 

0.061 

0.120 


of the wavelet variance estimator ([18], pp. 374-378), yielding the scaling 
exponent ( a ) and corresponding standard error (Table 4). For comparison, 
standard errors derived by bootstrap resampling (e.g. [43, 44]) considering 
50 bootstrap replicates are also included in Table 4. For some stations (Key 
West, Portland and New York) the bootstrap standard errors are fairly large, 
suggesting that the power-law assumption is questionable in these cases. For 
all records the values obtained for the scaling exponent a (Table 4) are con¬ 
sistent with long memory behaviour (within the ] — 1,0 [ range) but with dis¬ 
tinct degrees of stochastic persistence from weak (] — 0.4, — 0.2[) to moderate 
(] — 0.6, —0.4[) and strong persistence (] — 0.8, —0.6[). 

3.4 Linear Trends 

For the records for which a deterministic linear trend is plausible, the rate 
of sea-level change is estimated by generalised least squares. The estimates 
obtained from ordinary least squares regression are also shown for comparison 
(Table 5). The main difference in using generalised rather than ordinary least 
squares lies in the magnitude of the estimated standard errors: as a result of 
positive serial correlation, ordinary least squares errors are biased downward. 
This influences the statistical significance of the estimated trends, although 
the trend values themselves are not affected. 

Here, linear trends have been derived assuming a deterministic trend (as 
suggested by the statistical tests) and a short-range dependent process for the 
stochastic component. This assumption of serially-correlated (but not long- 
range correlated errors) is however, only valid for Newlyn and Halifax records, 
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Table 5: Linear trends (mm/yr) estimated by GLS. The error correlation structure 
is represented for Key West by a AR(1) process and for the remaining series by a 
ARMA(1,1) process. 



Pgls 

S.eGLS 

Pols 

s.eois 

Newlyn 

1.73 

0.13 

1.72 

0.083 

Coruna 

1.40 

0.33 

1.37 

0.17 

Vigo 

2.57 

0.36 

2.56 

0.18 

Halifax 

3.25 

0.18 

3.29 

0.059 

Kiptopeke 

3.34 

0.40 

3.36 

0.16 

Hampton 

4.37 

0.24 

4.33 

0.10 

Fort Pulaski 

2.98 

0.20 

3.00 

0.12 

Key West 

2.23 

0.086 

2.23 

0.052 


since the remaining stations exhibit long range dependence (although in the 
form of moderate stochastic persistence). This means that a deterministic 
linear model can be estimated for Newlyn and Halifax (e.g. as in [17] and [45]) 
while for the remaining records the linear model could be adapted in order 
to include a long range-dependent stochastic component [46]. An alternative 
approach for estimating linear slopes along with realistic uncertainties would 
be the joint estimation of the linear slope and of the scaling exponent by 
maximum likelihood. 

Table 5 indicates that the sea-level slope at Coruna is considerably lower 
than the one obtained for Vigo, although the two sites are very close. Accord¬ 
ing to [47] the discrepancy is explained by a jump in the reference level of the 
Coruna record. On the western boundary, the estimated sea level trends are 
higher for the stations in Chesapeake Bay (Kiptopeke, Hampton). The large 
linear trends obtained at Chesapeake Bay may result from land subsidence. 
Local subsidence is caused by groundwater extraction [48] while regional sub¬ 
sidence of the entire Mid-Atlantic coast results from post-glacial adjustment 
[49]. Furthermore, Chesapeake Bay has been identified as a tectonically active 
area [50]. At Hampton subsidence is possibly enhanced by compaction of the 
filling of a large buried impact crater [51]. 


4 Discussion 

Long-term sea-level variability has been characterised through the application 
of parametric statistical tests for stationarity, wavelet analysis for assessing 
scaling features, and generalised least squares for estimating deterministic 
trends. 

Parametric tests of stationarity are based on asymptotic properties which 
are not necessarily met in practice, and therefore require a large sample size 
to be efficient. Therefore, results from statistical tests alone must be viewed 
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with caution, particularly for short records (Brest, Coruna, Vigo). Moreover, 
statistical tests are designed in a way that the null hypothesis is rejected only 
if there is strong evidence against it. No rejection of a deterministic trend 
does not indicate the existence of a trend, but only that such feature cannot 
be ruled out. Although the application of parametric statistical tests can give 
some insight on the stationary features of a time series, in practice the distinc¬ 
tion between a nearly non-stationary stochastic process, such as a long range 
dependent process or a near unit root process, and a non-stationary determin¬ 
istic process is very difficult, since both type of processes yield similar features: 
an empirical autocorrelation function dying out slowly and a spectrum with 
large spectral content at zero frequency. Therefore, these tests are known to 
have low power, particularly against near unit root and fractionally differ¬ 
enced alternatives [52, 53, 54]. The discrete wavelet transform on the other 
hand is blind to polynomial trends and is particularly useful for assessing the 
scaling features of a time series, including long range dependence. 

The results obtained from the stationarity tests and the wavelet analysis 
show that the analysed tide gauge records exhibit distinct low-frequency char¬ 
acteristics. The stationarity tests indicate a deterministic trend for Newlyn, 
Coruna, Vigo, Kiptopeke, Hampton, Fort Pulaski and Key West (although 
only marginally for Key West). Except for Newlyn, all the other records also 
exhibit stochastic variability in the form of long range dependence. 

For Newlyn, the stochastic dependence is consistent with only a very weak 
long-memory process. Thus the trend component for Newlyn can be repre¬ 
sented by a (deterministic) linear trend plus a stochastic stationary noise. 
In the case of Brest, the value of a is similar to the value from the nearby 
station of Newlyn, although the results from the stationarity tests are quite 
different for the two records: for Brest, the trend stationarity null hypothesis 
is rejected while for Newlyn a deterministic trend cannot be ruled out. This 
is a consequence of the different length of the two time series and of the sen¬ 
sitivity of the stationarity tests to time series length; the wavelet approach is 
fairly insensitive to sample size, indicating a similar stochastic component for 
the two records. For Portland, Boston, Newport and New York the stochastic 
variability is characterised by strong long range dependence. While a deter¬ 
ministic feature cannot be entirely ruled out, the persistent behaviour from the 
stochastic component alone is able to explain the trend in these records with 
no need for a deterministic generating process. Thus a long memory model 
rather than a deterministic linear model should be considered in the descrip¬ 
tion of sea level variability from these records. Since for these sites the changes 
in the relative height of the sea surface are stochastically persistent in time, 
eventual disturbances in the coastal system can impact long-term sea-level 
variability, influencing sea-level variations even after the actual disturbance 
has ceased. 

The results presented here for long tide gauge records in the North Atlantic 
show, despite some local coherency, profound differences in terms of the low 
frequency structure of these sea-level time series. These differences suggest 
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that the trend structure reflects mainly local/regional phenomena. Therefore, 
each record must be analysed individually, and results from several tide gauge 
records (for example to obtain a regional estimate of sea-level change in the 
North Atlantic) should only be jointly considered if the corresponding records 
exhibit the same type of low-frequency properties. 

The characterisation of long-term sea-level variability is pertinent for the 
understanding, estimation and forecasting of sea-level change. For realistic 
estimates of the long-term rate of sea-level and for forecasting future varia¬ 
tions, both deterministic and stochastic contributions need to be taken into 
account. This is a challenging task, but this study shows how the combina¬ 
tion of different methodologies, in time and wavelet domain, can be used to 
extract additional information in terms of low-frequency characteristics from 
a sea-level time series. 
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Tide and Mean Sea Level Model Based on 
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Abstract. In this contribution an empirical approach to global ocean tide and 
Mean Sea Level (MSL) modeling based on satellite altimetry observations is 
presented with all details. Considering the fact that the satellite altimetry 
technique can provide sea level observations at the global scale, spherical har¬ 
monics defined for the whole range of spherical coordinates (0 < A < 2w, and 
—7 t/2 < (j) < +7t/ 2) could be among the possible choices for global ocean tide 
modeling. However, when applied for modeling of global ocean tide, spherical 
harmonics lose their orthogonality due to the following reasons: (1) Obser¬ 
vation of sea surface is made over discrete points, and not as a continuous 
function, which is needed for having the orthogonality property of spherical 
harmonics in functional space. (2) The range of application of spherical har¬ 
monics for global ocean tide modeling is limited to the sea areas covered by 
satellite altimetry observations and not the whole globe, which is also required 
for the fluffiness of the orthogonality of spherical harmonics. In this contribu¬ 
tion we show how a set of orthonormal base functions at the sea areas covered 
by the satellite altimetry observations can be derived from spherical harmon¬ 
ics in order to solve the lack of orthogonality. Using the derived orthonormal 
base functions, a global MSL model, and empirical global ocean tide models 
for six major semidiurnal and diurnal tidal constituents, namely, S2, M2, N2, 
Kl, PI, and 01 as well as three long term tidal components, i.e., Mf, Mm, 
and Ssa, are developed based on six years of Jason-1 satellite altimetry sea 
level data as a numerical case study. 

Keywords: Empirical tidal modeling, Harmonic analysis, Spherical harmon¬ 
ics, Orthonormal base functions, Gram-Schmidt, Tidal constituents, Mean Sea 
Level (MSL), Satellite altimetry, TOPEX/Poseidon, Jason-1. 
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1 Introduction 

Accurate knowledge of the ocean tide is essential in various geodetic and geo¬ 
physical applications, especially for removal of tidal effects from terrestrial, 
airborne, and satellite gravimetry observations. Moreover, computation of the 
Mean Sea Level (MSL) as the zero frequency tidal constituent allows, to¬ 
gether with global knowledge on Earth’s gravitational field, e.g. in terms of 
a geopotential model, to arrive at various products such as the geoid’s grav¬ 
ity potential value Wq, Sea Surface Topography (SST) and the marine geoid 
from sea level observations (for example [1, 2, 3, 4, 5]). Besides, engineering 
activities at sea, such as building harbours, offshore oil platforms, placing un¬ 
der water pipelines, and many other applications require accurate knowledge 
about sea level and currents, which cannot be derived without a thorough 
knowledge of the tide. Information about the tide, currents and seawater cir¬ 
culation are also of great importance for navigation. For a review on sea level 
observations and the history of tidal analysis we refer to NOAA web-page 
at http://co-ops.nos.noaa.gov/predhist.html. Due to the importance of tidal 
studies, establishment of tide gauge stations at harbours has a long time his¬ 
tory. For instance historical tide gauge stations are in operation since early 
19th century at San Francisco (USA), Cascais (Portugal), Brest (France), 
Newlyn (UK), Hoek Van Holland (Netherlands), Stockholm (Sweden), and 
Aberdeen (Scotland). 

It is important to note that coastal tide gauge stations along continental 
coastlines and islands can only provide local information about the sea tide 
at the coastal areas and close vicinity of the tide gauge stations. For that 
reason, even application of submerged tide gauges, which have been in use 
at offshore sea areas since mid 1960s, could not provide enough sea level 
information towards global ocean tide modeling. Before satellite altimetry, 
the lack of globally distributed sea level observations was the main reason for 
the application of differential equations for global ocean tide modeling and for 
the development of hydrodynamic and hybrid ocean tide models, purely based 
on tide gauge observations and also bottom pressure recorders at deep ocean 
sites, such as Sch80, which is derived by Schwiderski 1980 [6], FES94.1 which 
is computed by Le Provost et al. 1994 [7], and FES98 which is developed 
by Lefevre et al. 2000 [8]. The advent of satellite altimetry technique and 
application of the early satellite altimetry sea level observations provided the 
possibility of ocean tide studies at the global scale. As pioneer global ocean 
tidal studies based on satellite altimetry sea level observations, we may refer to 
Le Provost 1983 [9] via SEASAT satellite altimetry and Ray and Cartwright 
1990 [10] based on GEOSAT satellite altimetry mission. 

Today, altimetry satellites have made it possible to access uniformly 
distributed global information on temporal sea level variations, and as such 
satellite altimetry information is widely used for global ocean tide and MSL 
modeling. Modern altimetry satellites are equipped with range altimeter 
measuring instruments, which can measure the distance between satellite’s 


Empirical Global Ocean Tide and Mean Sea Level Modeling 177 

altimeter antenna and sea surface with an accuracy up to few centimetres. 
For example, the altimeter of TOPEX/Poseidon satellite is reported to be 
accurate as ±2.5 cm [11]. Thanks to geodetic positioning systems such as 
SLR (Satellite Laser Ranging), GPS (Global Positioning System), and DORIS 
(Doppler Orbit determination and Radiopositioning Integrated on Satellite) 
and also availability of accurate global gravitational field models such as Earth 
Gravity Model 1996 (EGM96) [12], the geodetic position of the altimetry 
satellites can be determined to a very high degree of certainty. For example, 
TOPEX/Poseidon altimetry satellite can be positioned in its orbit with an 
accuracy level of up to ±3 cm [11]. 

As a result of availability of versatile satellite altimetry information and 
also tide gauge information, tremendous efforts have been made towards global 
ocean tide modeling and therefore referring to only outstanding contributions 
may result in the following lengthy list: Andersen 1995 [13], Andersen 1995 
[14], Cartwright and Ray 1990 [10], Cartwright and Ray 1991 [15], Cartwright 
et al. 1991 [16], Desai and Wahr 1995 [17], Eanes 1994 [18], Eanes 2002 [19], 
Eanes and Bettadpur 1996 [20], Egbert et al. 1994 [21], Egbert 1997 [22], 
Egbert et al. 1999 [23], Egbert and Erofeeva 2002 [24], Egbert and Ray 2000 
[25], Egbert and Ray 2001 [26], Egbert and Ray 2003 [27], Kagan and Kivman 
1993 [28], Kantha 1995 [29], Knudsen 1994 [30], Krohn 1984 [31], Le Provost et 
al. 1994 [7], Le Provost et al. 1998 [32], Le Provost 2002 [33], Lefevre et al. 2000 
[8], Lefevre et al. 2002 [34], Letellier 2004 [35], Letellier et al. 2004 [36], Ma et 
al. 1994 [37], Matsumoto et al. 1995 [38], Matsumoto et al. 2000 [39], Mazzega 
et al. 1994 [40], Ray et al. 1994 [41], Ray 1999 [42], Sanchez and Pavlis 1995 
[43], Schrama and Ray 1994 [44], Schwiderski 1980 [6], Schwiderski 1980 [45], 
Schwiderski 1980 [46], Tierney et al. 2000 [47], and Wang and Rapp 1994 [48]. 
To be able to provide a systematic review over the mentioned contributions, 
we have arranged Table 1 and Table 2. Short technical specification of the 
abovementioned efforts towards global ocean tide modeling is provided in 
Table 1, while Table 2 is dedicated to a brief description of the computational 
procedure leading to global ocean tide models. There are also quite a few 
number of authors who have tried to assess the validity of the available global 
ocean tidal models by making use of various geophysical information and 
geodetic observations such as pelagic tide gauge data, and GPS observations 
as well as modern sources of gravity measurements. As a sample of those 
activities we may refer to Andersen et al. 1995 [49], Baker and Bos 2003 [50], 
Bos et al. 2002 [51], King and Padman 2005 [52], King et al. 2005 [53], Llubes 
and Mazzega 1997 [54], Shum et al. 1997 [55], and Urschl et al. 2005 [56]. To 
be able to provide a brief review over various approaches towards ocean tide 
modeling and also various types of ocean tide solutions, Table 3 and Table 4 
have been arranged. Table 3 provides a list of various approaches to ocean 
tide modeling along with a brief description, while Table 4 presents a list of 
various types of ocean tide models. 
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Table 2: Brief description of the computation procedures used for the global 
ocean tide models mentioned in Table 1. 

Model Ref. Description 


Sch80 


CR91 


CSR2.0 


TPXO.2 


Knudsen 


[6] Computed by Schwiderski (1980) as the first global hy¬ 
drodynamic ocean tide model purely based on tide gauge 
sea level observations, it has become a standard reference 
for the comparison of ocean tide models. The Schwiderski 
global ocean tide model is available on a regular grid of 
latitude-longitude 1° x 1°. 

[10] Computed by Cartwright and Ray (1990) as a global ocean 
tide solution for the diurnal and semidiurnal tidal compo¬ 
nents with latitude-longitude 1° x 1.5° resolution, it has 
been derived based on the first year of the GEOSAT satel¬ 
lite altimetry mission using orthotide expansion [57]. 

[18] Computed by Eanes (1994) as the global ocean tide model 
of the Centre for Space Research (CSR), University of Texas 
at Austin, it is based on two years of TOPEX/Poseidon 
satellite altimetry sea level observations, and applying or¬ 
thotide functions in response analysis approach. This model 
provides diurnal and semidiurnal ocean tidal constituents 
for the whole globe at a 1° x 1° grid intervals, and outside 
the coverage area of TOPEX/Poseidon it is extended to 
+66° < (j) < +72° and —72° < (j> < —66° using the CR91 
model [10] and to <j> > +72° and cf> < —72° via the Sch80 
model [6]. 

[21] Computed by Egbert et al. (1994) as the global ocean tide 
model of the Oregon State University (OSU) based on a 
global inverse solution that best fits hydrodynamical solu¬ 
tions and sea level observations. Sea surface measurements 
are provided by a homogeneous selection of the first 40 cy¬ 
cles of TOPEX/Poseidon satellite altimetry crossover data. 
This model includes eight major tidal constituents, namely, 
M2, S2, N2, K2, Kl, 01, PI, and Q1 and is given over a grid 
of latitude-longitude 0.58° x 0.70° within the area bounded 
by -80° <cj>< +70°. 

[30] Computed by Knudsen (1994) as a global ocean tide model 
based on harmonic analysis approach and surface spherical 
harmonics expansions up to degree and order n max = 18 as 
base functions for 34 cycles of TOPEX/Poseidon satellite 
altimetry sea level observations. 


Continued on the next page... 
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Table 2 - Continued 


Model Ref. 


Description 


FES94.1 


Mazzega 


RSC94 


SR95.0 


Rapp 


[7] Computed by Le Provost et al. (1994) as a pure hydrody¬ 
namic global ocean tide model, tuned to fit to the globally 
distributed tide gauges data. This model is the earliest ver¬ 
sion of the Finite Element Solution (FES) global ocean tide 
models and has been derived upon a finite element grid with 
very fine resolution near the coast. The design of the model 
is based on a non-linear formulation of the shallow water 
equations. The model is given over a grid of 0.5° x 0.5°, 
and contains eight major tidal constituents, namely, M2, 
S2, N2, K2, 2N2, Kl, 01, and Ql. The model covers the 
global ocean areas, except for some minor marginal seas 
such as the Bay of Fundy. 

[40] Computed by Mazzega et al. (1994) based on one year of 
TOPEX/Poseidon satellite altimetry data, including 40 cy¬ 
cles as well as coastal and deep sea tide gauge data to derive 
global estimates for eight major tidal constituents, namely, 
M2, S2, N2, K2, Kl, 01, Ql, and PI over a grid of 0.5° x0.5° 
within TOPEX/Poseidon coverage area. 

[41] Computed by Ray et al. (1994) as a global ocean tide 
model based on 65 cycles of TOPEX/Poseidon satellite al¬ 
timetry data using orthotide functions in response anal¬ 
ysis approach. This model which is a product of NASA 
Goddard Space Flight Center (GSFC) provides diurnal 
and semidiurnal tidal components over 1° x 1° grid within 
TOPEX/Poseidon coverage area. 

[44] Computed by Schrama and Ray (1994) as a global 
ocean tide model developed by NASA Goddard Space 
Flight Center (GSFC), using approximately 12 months of 
TOPEX/Poseidon satellite altimetry data based on har¬ 
monic analysis approach. The model includes major short- 
period tidal constituents, namely, M2, S2, N2, Kl, and 
01 within a grid of 1° x 1° over the coverage area of 
TOPEX/Poseidon. SR95.0 is a deep global ocean tide 
model for the sea areas over 250 m depth. 

[48] Computed by Wang and Rapp (1994) as the global ocean 
tide model of the Ohio State University (OSU) using 50 cy¬ 
cles of TOPEX/Poseidon satellite altimetry data, and har¬ 
monic analysis approach. This model is available over a 1° 
(in latitude directions) by 1.5° (in longitude direction) grid 
within the coverage area of TOPEX/Poseidon. 


Continued on the next page... 
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Table 2 - Continued 


Model Ref. 


Description 


Andersen [13] Computed by Andersen (1995) as a global ocean tide model 
representing major diurnal and semidiurnal tidal con¬ 
stituents, within a grid of latitude-longitude 0.75° x 0.75° 
spatial resolution bounded inside the interval —82° < </> < 
+82°. The data used for computation of this solution are 
from the first 1.5 years of ERS-1 and TOPEX/Poseidon 
altimetry satellites. 

DW95.0 [17] Computed by Desai and Wahr (1995) at the Univer¬ 

sity of Colorado as the global ocean tide model for di¬ 
urnal and semidiurnal tidal constituents, using 1.7 years 
of TOPEX/Poseidon satellite altimetry sea level observa¬ 
tions and applying orthotide response method [57]. The 
model is given on a grid of latitude-longitude 1° x 1° spa¬ 
tial resolution within the geographical area, bounded by 
— 66 ° < <j) < + 66 °. 

Kantha.l [29] Computed by Kantha (1995) as a high-resolution global 
ocean tidal model, not including the Arctic region, com¬ 
puted for semidiurnal constituents, namely, M2, S2, N2, 
and K2 and diurnal tidal components, namely, Kl, 01, PI, 
and Ql, using two years of TOPEX/Poseidon satellite al¬ 
timetry sea surface measurements within sea areas deeper 
than 1000 m, and coastal tide gauge sea level observations. 
The model is given on a grid of 0.2° x 0.2° within the geo¬ 
graphical area —77° < <j> < +69°. 

ORI96 [38] Computed by Matsumoto et al. (1995) as a global ocean 
tide model for eight major tidal constituents, namely, M2, 
S2, N2, K2, Kl, 01, PI, and Ql, with 0.5° x 0.5° spa¬ 
tial resolution using TOPEX/Poseidon sea surface height 
data of cycles 009 to 094 by applying response analysis to 
tidal constituents at crossover points and hydrodynamical 
interpolation. 


GSFC94A [43] Computed by Sanchez and Pavlis (1995) as the global ocean 
tide model of Goddard Space Flight Center (GSFC) for the 
main diurnal and semidiurnal tidal constituents, namely, 
M2, S2, N2, K2, Kl, 01, PI, and Ql, using approximately 
15 months of TOPEX/Poseidon satellite altimetry sea level 
data. The GSFC94A model is given over a grid of 2° x 2° 
for the global ocean areas within —76.75° < (f> < +69.25° 
and deeper than 250 m. 


Continued on the next page... 
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Table 2 - Continued 


Model Ref. 


Description 


CSR3.0 [20] Computed by Eanes and Bettadpur (1996) as the global 
ocean tide model of the University of Texas for diurnal and 
semidiurnal tidal constituents using response analysis tech¬ 
nique applied to 2.4 years of TOPEX/Poseidon satellite 
altimetry observations, within cycles 001-089. The model 
is available over a grid of latitude-longitude 0.5° x 0.5° spa¬ 
tial resolution and preserves the fine details of the FES94.1 
model, whilst being more accurate for long wavelength sig¬ 
nals. The model within the area <fi > +66° and <fi < —66° 
is exactly the same as FES94.1 ocean tide model [7]. 

TPXO.3 [22] Computed by Egbert (1997) as the global ocean tide model 
which in least squares sense best fits the hydrodynamic 
tidal solution to cross-over data from the first 116 cycles of 
TOPEX/Poseidon satellite altimetry mission. This model 
provides eight primary tidal constituents, namely, Kl, 01, 
PI, Ql, M2, S2, N2, and K2, within a grid of latitude- 
longitude 0.58° x 0.70° spatial resolution over the sea areas 
bounded by —79.71° < 4> < +69.71°. 

FES95.2.1 [32] Computed by Le Provost et al. (1998) as an upgraded ver¬ 
sion of the FES94.1 [7] tidal solution, via assimilation of 
altimeter-derived sea level data and hydrodynamic model. 
The model is given over latitude-longitude 0.5° x 0.5° grid 
for the eight major tidal constituents, namely, Kl, 01, Ql, 
M2, S2, N2, K2, and 2N2. 


GOT99.2b [42] Computed by Ray (1999) as the global ocean tide model 
which derives its long wavelength signals from FES94.1 hy¬ 
drodynamic model [7] and uses TOPEX/Poseidon data to 
adjust the hydrodynamic solution. The model is given over 
0.5° x 0.5° grid. GOT99.2b model within the area <fi > +66° 
and (f> < —66° is purely the hydrodynamic solution and 
for the sea areas bounded by —66° < (f> < +66° is based 
on both the satellite altimetry data and the hydrodynamic 
solution. 


GOTOO.2 [42] Computed by Ray (1999) as the updated version of 
GOT99.2 [42] that assimilates TOPEX/Poseidon, ERS-1, 
and ERS-2 satellite altimetry data. This model uses 286 
cycles of TOPEX/Poseidon and 81 cycles of ERS-1 and 
ERS-2 satellite altimetry data to adjust a priori FES94.1 
hydrodynamic model [71. The model is given over a grid of 
0.5° x 0.5° globally. 

Continued on the next page... 
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Table 2 - Continued 


Model Ref. 


Description 


FES98 [8] Computed by Lefevre et al. (2000) as a version of the Final 
Element Solution (FES) hydrodynamic ocean tide models, 
which is based on assimilated tidal models at approximately 
700 coastal, island and deep ocean tide gauges sea level 
observations, with the hydrodynamic solution. This model 
provides eight major tidal constituents, namely M2, S2, N2, 
K2, 2N2 , Kl, 01 and Ql, over a grid of latitude-longitude 
0.25° x 0.25° spatial resolution. 

NA0.99b [39] Computed by Matsumoto et al. (2000) as a global ocean 
tide model for 16 major tidal constituents over a grid of 
latitude-longitude 0.5° x 0.5° spatial resolution based on 
assimilation of five years of TOPEX/Poseidon satellite al¬ 
timetry sea level measurements with hydrodynamic model 
Sch80 [6]. 

CSR4.0 [19] Computed by Eanes (2002) as the updated version of 
CSR3.0 global ocean tide model [20], via application of a 
longer time span of TOPEX/Poseidon satellite altimetry 
sea level data. CSR4.0 model provides diurnal and semid¬ 
iurnal major tidal constituents on a 0.5° x 0.5° grid and 
has been developed using 6.5 years of TOPEX/Poseidon 
satellite altimetry sea level observations within cycles 001 
to 239. 

TPXO.5 [22] Computed by Egbert (1997) from Oregon State University 
(OSU) is a global ocean tide model derived by applica¬ 
tion of inverse tidal theory to tide gauge data as well as 
TOPEX/Poseidon satellite altimetry observations to make 
optimum balance between sea level observations and the 
linearized hydrodynamics theory. This model is available 
over a 0.5° x 0.5° grid. The methods used for the computa¬ 
tions of this model are described in detail by Egbert et al. 
(1994) [21] and by Egbert and Erofeeva (2002) [24]. 

TPXO.6.2 [24] Computed by Egbert and Erofeeva (2002) is an updated 
version of TPXO.5 model, provided over a 0.25° x 0.25° 
grid. 

TPXO.7.0 [24] Computed by Egbert and Erofeeva (2002) it is an updated 
version of TPXO.6.2 model developed over a 0.25° x 0.25° 
grid. TPXO.7.0 model is the current version, which best-fits 
in least-squares sense to the Laplace Tidal Equations (LTE) 
and along track averaged data from TOPEX/Poseidon and 
Jason-1 satellite altimetry sea level observations. 

Continued on the next page... 
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Table 2 - Continued 


Model Ref. 


Description 


FES99 


FES2004 


[34] Computed by Lefevre et al. (2002) as the hydrodynamic 
global ocean tide model of the kind Finite Element So¬ 
lution (FES), following prior solutions of the kind, i.e., 
FES95.2.1 [32], and FES98 [8]. The model is the result of as¬ 
similation of the hydrodynamic tidal solution derived from 
barotropic equations and 700 tide gauges and 687 cycles 
of TOPEX/Poseidon satellite altimetry observations. The 
model includes 8 major tidal constituents, namely, M2, S2, 
N2, K2, 2N2, Kl, 01, and Ql. 

[35] Computed by Letellier (2004) as a global 0.125° x 0.125° 
ocean tide model, of the kind of Finite Element Solution 
(FES), for 14 tidal constituents, namely, M2, S2, K2, N2, 
2N2, 01, PI, Kl, Ql, Mf, Mtm, Mm, Msqm and M4. 


Here our focus will be on the harmonic analysis approach using orthonor¬ 
mal base functions over the sea areas for the computation of global ocean tide 
models. Application of orthonormal base functions to global sea surface stud¬ 
ies has previously been proposed and applied by for example Mainville 1987 
[65], Hwang 1991 [66], Hwang 1993 [67], Hwang 1995 [68], Rapp et al. 1995 
[69], Rapp et al. 1996 [70], and Rapp 1999 [71]. We differ in our approach from 
previous contributions to global ocean tide modeling via harmonic analysis in 
the type of model applied as basis, namely orthonormal base functions over 
the sea areas. Here we will apply the idea of using orthonormal base functions 
to represent MSL and sine and cosine coefficients of the nine main tidal con¬ 
stituents, namely, S2, M2, N2, Kl, PI, 01, Mf, Mm, and Ssa at the global 
scale based on six years of Jason-1 satellite altimetry observations for cycles 
001 - 200 . 

In the following section, the underlying mathematical theory of our ap¬ 
proach is presented. Section 3 entitled “Case study” is devoted to technical 
details and our numerical results, while the Sect. 4 entitled “Assessments” 
covers the numerical tests for checking the validity and the accuracy of the 
derived geophysical models. Final conclusions and remarks are given in the 
last section. 


2 Mathematical Setup and Modeling Scheme 

Let us start the explanation of our approach by the mathematical setup for 
modeling global sea level variation through harmonic analysis using Fourier 
sine and cosine expansion as follows: 
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Table 3: Summary list of various approaches for ocean tide modeling. 

Tidal Modeling Description 

Scheme 

Harmonic Analysis Invented by Darwin in 1883 [58] as an efficient tool for the 
study and modeling of the ocean tide from sea level observa¬ 
tions. The harmonic analysis approach implements the Fourier 
sine and cosine base functions and computes the projection of 
the time series of sea level variations at a point onto the base 
functions, by using the orthogonality of the Fourier base func¬ 
tions or least squares computations. More details can be found 
for example in Cartwright and Ray 1990 [10], Cartwright and 
Ray 1991 [15], Cherniawsky et al. 2001 [59], Cherniawsky et 
al. 2004 [60], Knudsen 1994 [30], Ponchaut et al. 2001 [61], 
and Schrama and Ray 1994 [44]. 

Response Analysis This method solves for the response of the ocean surface to the 
tidal forcing instead of computing the tidal constituents. See 
for example Desai and Wahr 1995 [17], Eanes and Bettadpur 
1996 [20], Ray et al. 1994 [41], Smith et al. 1997 [62], Smith et 
al. 1999 [63], and Matsumoto et al. 1995 [38] for details on the 
response analysis approach towards ocean tide modeling. Ma 
et al. 1994 [37], Matsumoto et al. 1995 [38], Smith 1997 [64], 
Smith et al. 1997 [62], and Smith et al. 1999 [63] have proved 
that “response analysis” and “harmonic analysis” methods 
are compatible towards ocean tide modeling when applied to 
satellite altimetry sea level observations. 

Dynamic Approach Begins with Isaac Newton in 1687 and his hydrostatic equi¬ 
librium theory for the synthesis of tide phenomenon based on 
its driving forces. Half a century later Pierre-Simon Laplace 
in 1775 established a system of partial differential equations 
referred to Laplace Tidal Equations (LTE) to describe flow of 
the water mass due to the tidal forces. Laplace Tidal Equa¬ 
tions (LTE) are based on the bathymetry and the shape of 
the ocean boundaries and are still used for the hydrodynamic 
modeling of the ocean tide. For the details see for example 
Andersen 1995 [14], Egbert et al. 1994 [21], Egbert 1997 [22], 
Egbert and Erofeeva 2002 [24], Kantha 1995 [29], Le Provost 
et al. 1994 [7], Le Provost et al. 1998 [32], Le Provost 2002 [33], 
Lefevre et al. 2000 [8], Lefevre et al. 2002 [34], Letellier 2004 
[35], Matsumoto et al. 1995 [38], Matsumoto et al. 2000 [39], 
Ray 1999 [42], Schwiderski 1980 [6], Schwiderski 1980 [45], and 
Schwiderski 1980 [46] 
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Table 4: List of various types of ocean tide models. 


Ocean Tide 

Description 

Solution 


Empirical models 

Based on sea level observations and not the driving forces 
of the tide phenomenon. 

Hydrodynamic models 

Based on gravitational forces driving the tide phe¬ 
nomenon and interaction of sea bottom topography, the 
shape of the ocean boundaries, and friction between the 
sea bottom and tidal currents in a system of partial dif¬ 
ferential equations. 

Assimilation models 

Dynamical models assimilating tide gauge and satellite 
altimetry sea level observations. In other words, the gen¬ 
eral dynamics of the sea, due to the tide, is combined 
with the sea surface measurements. 


N 

ssh(X , </>; t) = U 0 { A, </>) + -Afc(A, cos(v k t + V>fc(A, 0) + i/fc). (1) 

k =1 

In Eq. (1), ssh(A,<f)-,t ) is the Sea Surface Height (SSH) with respect to a 
reference ellipsoid, (A, cf>) are geodetic longitude and latitude, and f is the 
time. The integer value iV represents the total number of tidal constituents 
considered in the mathematical model. u>k = 2n/Tk is the angular velocity of 
the tidal constituent k and T' k denotes the time period of the tidal constituent 
k. Thanks to astronomical tidal studies, the frequencies and the periods of 
the tidal constituents are accurately known. Ak(A,(f>) and ^*,(A,<(>) in Eq. (1) 
are respectively the amplitude and phase lag of the tidal constituent k, which 
are both functions of the geodetic coordinates (A, <f>) and are considered as 
unknown parameters of the Eq. (1). Factors f k and Uk express the nodal 
modulations due to the lunar tides of the tidal constituent k. Factors fk and 
Uk are for 18.6-year regression of the lunar nodal point and both are depending 
on the position of the lunar node. These factors can be computed for the major 
lunar tidal constituents, namely M2, N2, Kl, 01, Mf, and Mm by using the 
formula derived by Doodson 1928 [72] (See Appendix A). The second part of 
the Eq. (1) is also called Sea Level Anomaly in the oceanographic literature. 
For the purpose of solving the unknown amplitude A k {A, (f>) and phase lag 
ipk{A,(f>) using the least squares approximation, and to avoid the singularities 
of the amplitude at the amphidromic points, it is more common to write the 
Eq. (1) in the following form using cosine and sine functions: 

ssh(A,4>',t) = Uo(A,(j>)+ 

N 

{Hfe(A,0)/ fc cos(wfct + Mfc) + Hfc(A, cj))fksm(uj k t + uk)} ■ 

k =1 


( 2 ) 
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Equation (2) can be considered as the basic observation equation in harmonic 
analysis approach. Another advantage of applying the above equation as a 
mathematical model, as compared to Eq. (1) is the linearity of the equation 
with respect to unknown parameters, which avoids iteration of adjustment 
computations. Our criterion for the number of considered tidal constituents, 
was to produce a balance between: (i) the gained accuracy of the tidal model, 
which should be compatible with the accuracy of satellite altimetry observa¬ 
tions, and (ii) the computation labour. Theoretically, by including more tidal 
constituents, a more accurate tidal model would be obtained. However, in 
practice one should bear in mind that the final accuracy of the model can¬ 
not exceed the accuracy of the input data. In Eq. (2) functions £4(A, </>) and 
T4(A, <j>) are coefficients of sine and cosine functions of the Fourier expansion 
respectively, and Uo(X,<f>) is the Mean Sea Level (MSL) as the constant part 
of the Fourier expansion, i.e., zero frequency oceanic wave, which are all func¬ 
tions of geodetic longitude A, and latitude (j>. Having derived these functions, 
the amplitude Ak{ A, <j>) and phase ipk( A, <f>) functions of the tidal constituents 
can be computed as follows: 

M A, </>) = \/£4(A, </>) 2 + I4(A, 4>) 2 (3) 

<MA,«= tan-'^gA. (4) 

In order to develop mathematical models for the unknown functions, namely, 
Uk(X, </>), 14(A, and Uq(X, <j>), some base functions have to be selected. Ow¬ 
ing to the coverage of the oceans over the Earth, an ideal set of base functions 
for the modeling should be those which are defined for the whole range of 
spherical angles, i.e., 0 < A < 27r, and —tt/ 2 < <j> < +-k/2. Possible candidates 
are, for example, the surface spherical harmonics, or the spheroidal surface 
harmonics developed by Thong and Grafarend 1989 [73]. In this study, we 
first select surface spherical harmonics as a basis, and the unknown functions 
[ 4 (A, </>), Vk(X,<j>), and Uo(X,<j>) are mathematically formulated as the expan¬ 
sion of these functions up to the degree and order n max , shown in Eqs. (5), 
and (6). 

7 1 ma x 77 _ _ 

Uk{X,(/>)= E E a* m C nm (X,<f)) + b* m S nm {X,<f>) 

n— 0 m— 0 

77 max 77 (5) 

Vk(X, <j>) = E E c£ m C nm (A } 0) + d k njn S nm (X,(j)) 

n— 0 m=0 

Vfc = 1,2,...,7V 

77 max 77 

Uo( A, (j>) = a nmCnm( A, 0) + b° nm S nm ( A, (f>) (6) 

n—0 m —0 

where a° m , 6° m , a^ m , c* m , and d^ lm are the unknown coefficients to be 
determined, and C nm ( A, </>), and S nm (X,<p) are normalised surface spherical 
harmonic functions from degree n and order m , defined by Eq. (7) [74] 
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cos m\ 
sin toA 


( 7 ) 


subject to Vn = 0,1,..., n max and m = 0,1, In Eq. (7), P nm are fully 
normalised associated Legendre functions of the first kind from degree n and 
order m, defined in Appendix B. If we assume that the surface spherical 
harmonics are expanded up to the degree and order n max and we have N 
tidal constituents included in the mathematical model, then the total number 
of the unknown parameters that have to be computed within Eq. (2) is equal 


to (2 N + 1) x (n max + l) 2 . 


max 


Here, it should be mentioned that surface spherical harmonics are orthog¬ 
onal base functions when the spatial domain of their application is the whole 
sphere, i.e., spherical angles 0 < A < 27r, and —n/2 < (f> < +7r/2. However, no 
satellite altimetry mission gives such a coverage and in any case the Earth is 
not fully covered by water. Besides, satellite altimetry sea level observations 
are made over discrete points and not continuously, therefore the continuity 
condition, needed for the orthogonality property of spherical harmonics would 
be violated. Therefore, if the surface spherical harmonics are used for global 
ocean tidal modeling as the basis, then they lose their orthogonality. Hence, 
owing to the non-orthogonality of surface spherical harmonic functions over 
the oceans, spectral analysis of oceanic signals using such a representation 
may lead to misleading results and implications [67]. Therefore, it can be con¬ 
cluded, surface spherical harmonics are not optimal for the data defined only 
over the oceans. To resolve this problem, a different set of base functions has 
to be used for the representation of oceanic tides. The ideal base functions 
will be a set of orthonormal base functions over the oceans, more precisely the 
study area. Therefore, as the first step in global ocean tide modeling, one has 
to follow a mathematical procedure which could lead to functions defined over 
the study area and that are orthogonal. Application of orthonormal base func¬ 
tions to oceanic studies at the global scale, e.g. for Sea Surface Topography 
(SST) modeling, has previously been proposed by Mainville 1987 [65], Hwang 
1991 [66], Hwang 1993 [67], Hwang 1995 [68], Rapp et al. 1995 [69], Rapp et 
al. 1996 [70], and Rapp 1999 [71]. Here we summarise our reasons for applica¬ 
tion of normalised surface spherical harmonics within the Eqs. (5), and (6) as 
follows: (i) These functions are well studied and are widely used for represen¬ 
tation of the oceanic signals [30]. (ii) These functions can be orthogonalisecl 
within orthonormalising procedures. For domains of irregular geometry such 
as the oceans, the Gram-Schmidt orthonormalising process can be success¬ 
fully applied [67]. Next, such orthonormal base functions can be considered 
to derive generalised Fourier sine and cosine functions within Eqs. (5), and 
(6). In this study, this process is going to be used and related problems will 
be addressed. The principle of the Gram-Schmidt orthonormalising process 
is well documented, for example by Davis 1975 [75] and Kreyszig 1978 [76]. 
A brief summary of the fundamentals of the Gram-Schmidt orthonormalising 
process is presented in Appendix C. 
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Let us start by applying Gram-Schmidt orthonormalising process to spheri¬ 
cal harmonics, when applied to global ocean tide modeling, in order to obtain 
a set of orthonormal base functions over the study area, i.e., the sea area 
covered by satellite altimetry measurements. Spherical harmonics are orthog¬ 
onal functions and as such are independent functions within space covered by 
whole the sphere. Therefore, they can be considered as input functions of the 
Gram-Schmidt orthonormalising process. Let us define a sequence {X,} for 
i = 0,1,2 ,..., consists of all possible normalised surface spherical harmonic 
functions C nm {X, (j>), and S nm {X,(p) up to maximum degree and order n max 
as follows: 

{Xi} = {Cqo, Cio, Cii, §11, C20, C21, S 21, C22, § 22, 

C30 , G .31 ; §31 , C32 , S32, C'33, .S33, (8) 

Considering the normalised surface spherical harmonic functions C nm (X, <j>), 
and S nrn (X, (j>), the Gram matrix G can be defined by Eq. (9) as follows: 


G = 


(Coo|Cbo)c (Cbo|Cio)f (C'oo|C'ii)c (C'ool^ii)c 

(Cio|C'oo)f (C'io|C'io)f (C'iolC'n)c (C'io|/§n)f ••• 

(Cn|Coo)c (C l n|C’io)c (C , n|C'ii}f <Cu|Sii) c ••• 

(<Sii|Coo)c (<Sn|Cio)c <5ii|C u ) f (Sn|<5n)f ••• 


(9) 


(C nn |C'oo)c (Cnn\C W ) ( (C nn \Cu}< (CnnlSujc--- 
_{§nn\Coo)c (Snn\C 10 )c {§nn\C n )( (S nn |Sn) C ■ • •. 

where symbolises the inner products of the normalised spherical har¬ 

monic functions over the domain £, i.e., the oceans covered by satellite al¬ 
timetry. The inner products in Eq. (9) can be written as 


o c 


{Cnm (A, 4>) \C rs (A,0 )) c ' 
(5 nm (A,<A)|5 rs (A,^»)) c 
{Cnm (A, (f>) (A, </>)) C 

(S nm {X,cj>)\C rs (A,0)) c 

/ f Pnm (sin 4>) P rs (sin <j>) < 
C 


cos mX cos sX 
sin mX sin sA 
cos mX sin sX 
sin mX cos sX 


> da. 


( 10 ) 


In Eq. (10), a^ represents the total sea area covered by satellite altimetry 
observations, and da = cos (j)dXd(j) denotes a surface differential element. The 
total sea area a^ can be computed as the sum of the area of the finite elements 
covering the whole sea area of interest as follows: 
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a C 


K L 


^ 1 +1 </> fc+1 




WfcZ 


cos (j)d\d(j). 


fc =i ;=i 


Ai 4>k 


( 11 ) 


The Gram matrix defined in Eq. (9) in terms of normalised surface spherical 
harmonics can be written as follows: 


f (C nm (XA)\Crs (A, 0)) c ' 

I (S nm (X,<f>)\S„M))c 
1 (C nm (A,0)|5 rfl (A,^)) c 
l<S nm (A,0)|C„ (A,^)) c 


K L A i+ i 

^ E E “kltnmrs I 

k=ll=l 


cos mA cos sA 
sin mA sin sA 
cos mA sins A 
sin mA cos sA 


\ 


dX. 


( 12 ) 


Here, K and L in Eqs. (11) and (12) are the number of blocks in the latitudinal 
and longitudinal direction, uu, which can be called the “sea function”, attains 
1 when the integration is over the sea area of interest and 0 in other cases, i.e., 


Wfc; 


l, (A, 4>) s C 
0, (A, <f>) £ C' 


(13) 


In Eq. (12) represents the integral of the product of two fully normalised 
associated Legendre functions within an element as follows: 

0fc+1 

inmrs= j ^nm (sin (/)) P rs (sin (j>) COS (j)d(j). (14) 

<kk 


Using symbolic operation tool-boxes, provided for example by Mathematica 
or Matlab, integral of the products of cosine and sine functions in Eq. (10) can 
be analytically computed, even up to a very high degree and order. However, 
there is not such a possibility for the integration of the products of two fully 
normalised associated Legendre functions, i.e., £* mrs in Eq. (12). Here £^ mrs 
can only be computed by recursive formulae, which are given by Mainville 
1987 [65], and applied by Hwang 1991 [66], Hwang 1993 [67], and Hwang 1995 
[68] to Sea Surface Topography (SST) modeling as follows: 


4 


k 

nmrs 


a(n,m) f n-r-2 rk i 2r+l rk 

n+r+1 1 a(n-l,m) 1 n-2,mrs ' a(r,s) 1 n-l,m,r-l,s 


(1 - X 2 ) P n - hm (x)P rs (x)\ X x k k+i } 


(15) 


when n ^ m and r ^ s 
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= 

S nmrs 

^l{w^)lnn,r-2, s ~ (1 ~ ^P^P^x) 


when n = m and r ^ s, 


^nmrs 

a(n,m) r n-r-2 jk _ /i _ 2\ p / \ p ( T \\ Xk \ 

n+r+l U(n-l,m) 1 rr,n-2,m l 1 x ^/ 

when n ^ to and r = s, 
f k = 

>>nmrs 

^^{{n + r)b{n)b{n- l)/£_ 2 ,„-2,rr + £-Pnn(aOAx(®)|“* +1 } 


(16) 


(17) 


(18) 


when n = to, r = s, and n ^ 0,1. In the above equations a; = sin0, Xk = 
sin </>*,, = sin<^/-_|_i, and a (•,•), and &(•) are defined as follows: 




6(1) = 73 


6 («) = / 


-./2 i ±l,Vn>l. 


(19) 


In order to compute the needed initial values of Eqs. (15) to (18), i.e., for 
n = 0 , 1 , to = 0 ,..., n, r = n,..., n max , and s = to, ..., r, we used symbolic 
programming of Matlab. Alternatively, there is another method to compute 
the integral product of two fully normalised associated Legendre functions, 
i.e., £nmrsi name d as “Product-Sum Formulae” applied by Hwang 1991 [ 66 ] 
which is not used in this study. After computations of the Gram matrix G 
within the above steps, everything will be ready to derive the elements of the 
matrix C of combination coefficients cp- via a Cholesky decomposition of the 
Gram matrix G as shown in Appendix C by Eqs. (C.5) and (C.6). Now let us 
define sequence { Xi } for * = 0 , 1 , 2 ,..., consists of orthonormal base functions 
O nm ( A,(/>), and R n m( A, </>) over study area as follows: 


{All} — {Goo, Oio, On, Rll, O 20, O21, i?21, O22, 7?22, 
o30, O 31 , i? 3 i, O 32 , R 32 , O 33 , i? 33 , 


• . . , Or 


max ' ‘■max 


R 


"> 71 ma x 77 max 


}• 


( 20 ) 


Here the elements of the matrix C, i.e., combination coefficients in Gram- 
Schmidt orthonormalising process, as defined in Appendix C by Eqs. (C.2), 
(C.3), and (C.4), produce the set of orthonormal base functions {Xi} using 
the set of surface spherical harmonic functions {A,} as follows: 


Xi( A, <P) = '^2 GjOfjCA, 4>). 
3=0 


( 21 ) 
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As such we can derive the orthonormal base functions O nm (\,cf>), and 
Rnm(X,<j)) using Eqs. (22) to (24), (See Mainville 1987 [65] and Hwang 1995 
[ 68 ]). 

2-1 

O nm (A, (/>) = CuCnm (A, </>) + E CijXj (A, (j>) (22) 

1=0 

subject to i = n 2 for m = 0, and i = n 2 + 2 m — 1 for m ^ 0, 


Rn o(A, 0) = 0 


(23) 


subject to m = 0, 


i—l 

Rnm (A, </>) — Cii Snm (A, 4>) + E CijXj (A, (f>) (24) 

l=o 

with i = n 2 + 2m for m ^ 0. The maximum degree and order of spherical 
harmonic expansion can be determined from the numerically linearly inde¬ 
pendent columns of the Gram matrix (See for example Hwang 1993 [67], and 
Rapp 1999 [71]). Now the orthonormalised base functions O nm (\ : cfi), and 
Rnm( A,</>) can be used in Eqs. (5) and (6) in order to obtain the coefficients 
of the orthonormal base functions: 


u k {\A) 

v k (\A) 


7 1 ma x 71 _ 

E E a nm^nm(Ai 4 1 ) T b nrn R n m{ A, (j) ) 


71=0 771 = 0 


71 max 71 _ _ 

E E c nm° nm (A, </>) + d nm R nm (A, <t>) 

71=0 771=0 


Vfc = 1,2,..., IV 


(25) 


71 max 71 

^0(A, </>) = E E a° m O nm (A,<( ) ) + 6° mJ R„ m (A,<)»). (26) 

71 = 0 771=0 


Finally, coefficients of the orthonormal base functions, namely, a° m , &° m , a^ m , 
c^ m , and as well as the covariance matrix of the unknown parame¬ 
ters can be estimated using the satellite altimetry observations in Eq. (2) as 
follows: 


x = (A T C[” 1 A) _1 A T Cj _1 l 
C* = (A T Cf 1 A)" 1 


(27) 


where 1 represents the vector of observations, A is the design matrix, P is the 
weight matrix, x is the vector of estimated unknown coefficients, and C* is 
the covariance matrix of the estimated coefficients. Selection of an appropriate 
sampling interval is one of the most important issues to be addressed when 
dealing with harmonic phenomena. More precisely, aliasing and over sampling 
are two key issues, which must be avoided in the harmonic analysis. Over 
sampling has no benefit other than making the numerical calculations too 


Empirical Global Ocean Tide and Mean Sea Level Modeling 193 

long and may even lead to matrix operations to become below the capacity 
of common computers. However, aliasing is a much more serious problem. 
According to Nyquist sampling theorem, the sampling frequency must be at 
least twice the maximum frequency to be measured. When the frequency is 
higher than the Nyquist limit, i.e. half the sampling frequency, aliasing occurs. 
Thus, fulfilling the sampling theorem is an essential condition for a complete 
estimate of the coefficients of the base functions. In our case, to avoid this 
potential source of error, the sampling interval of the satellite altimetry data 
along the track must be equal to at least half of the spatial resolution of the 
maximum degree of the spherical harmonic expansion. Because the spatial 
resolution of the maximum degree of surface spherical harmonic functions is 
half of the spatial wavelength, it follows that the sampling interval must be 
selected at least to be a quarter of the spatial wavelength. Thus the sampling 
interval can be computed by the formula (6400 x n /n max ) /2 (km). 


3 Case Study 

In this section we present technical details related to the computation of the 
global ocean tide amplitude and phase models for six semidiurnal and diurnal 
major tidal constituents, namely, S2, M2, N2, Kl, PI, and 01 together with 
three long term tidal components, namely, Mf, Mm, and Ssa, and Mean Sea 
Level (MSL) based on orthonormal base functions over the study area and first 
six years of Jason-1 satellite altimetry data, including cycles 001-200. Jason- 
1 is jointly conducted by the Centre National d’Etudes Spatiales (CNES) 
and the National Aeronautics and Space Administration (NASA). Jason-1 is 
a follow-on mission to the highly successful TOPEX/Poseidon project and 
overflies the TOPEX/Poseidon ground tracks. Jason-1 was launched on De¬ 
cember 7, 2001, and its first cycle began on January 15, 2002, coinciding with 
TOPEX/Poseidon cycle 344 [77]. Our computations are based on Geophysical 
Data Records (GDR) provided by the Jet Propulsion Laboratory (JPL) from 
http://podaac.jpl.nasa.gov. To the observed altimeter range, i.e., the mea¬ 
sured distance between satellite and sea level, the following corrections need 
to be applied: (i) Wet tropospheric delay, (ii) Dry tropospheric delay, (iii) 
Ionospheric delay, (iv) Electromagnetic bias, (v) Inverse barometer pressure, 
(vi) Solid earth tide, and (vii) Pole tide. In order to apply these corrections we 
used the standard correction formulas provided by JPL for Jason-1 GDR data 
records [77]. Using these formulas the corrected range, i.e., corrected distance 
between the satellite and sea level can be computed. The corrected range was 
then combined with the measured geodetic ellipsoidal height of the satellite, 
i.e., altitude, derived from its precise positioning systems, to determine the sea 
surface height ssh( A, </>•, t ) with respect to a reference ellipsoid in a point-wise 
manner as follows: 


ssh( A, <j>; t) = altitude(X, <f>; t) — range( A, <fi ; t). 


(28) 
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The reference ellipsoid used for Jason-1 is an ellipsoid of revolution with equa¬ 
torial radius of a = 6378136.3 m and flattening / = 1/298.257, the same 
reference ellipsoid as used by TOPEX/Poseidon. Jason-1 sea level measure¬ 
ment is reported to have an accuracy of ±4.2 cm [77]. We excluded the data 
over shallow water within a band of 5 km from the shoreline to avoid high 
frequency noises. We also did not use data flagged by JPL as less accurate 
data points, according to [77]. As mentioned before, the frequencies of the 
tidal constituents are taken as known values in our study because their pre¬ 
cise values are known from astronomical studies. Table 5 shows the periods 
of the nine tidal constituents used here, namely, S2, M2, N2, Kl, PI, 01, Mf, 
Mm, and Ssa. Computation of the elements of the Gram matrix G requires 
determination of the inner products of normalised surface spherical harmonic 
functions defined by Eq. (9) over the sea areas of interest, i.e., the sea area 
covered by the satellite altimetry observations. Computation of the integrals 
can be readily done in analytical form if the sea areas are known. For this pur¬ 
pose first we covered the world with a latitude-longitude 1° x 3° grid, and next 
using the Jason-1 satellite altimetry data, those grid cells residing for which 
at least one altimetry data is flagged as sea, are considered over the sea areas. 
The criterion for selecting the 1° x 3° search grid cells to distinguish land from 
sea was the minimum cross-track spatial resolution of Jason-1 data over the 
equator, which is about 3°. It is also important to note that in this way the 
areas outside the coverage of the Jason-1 satellite will be also excluded from 
the integration domain as is needed. Using the determined 1° x 3° grid cells 
over the sea areas the elements of the Gram matrix are computed for each 
cell analytically and then summed up in order to have the surface integrals 
computed for the whole sea area of interest. Besides, for the later applications, 
a finer 1° x 1° grid is also generated from the developed 1° x 3° grid. Figure 1 
shows the derived 1° x 1 ° grid over the sea areas which is also limited to the 
sea areas covered by Jason-1, i.e., sea areas within —66° < 4> < ±66° and 


Table 5: Periods of the nine main tidal constituents used in the global ocean tide 
modeling, namely, S2, M2, N2, Kl, PI, Ol, Mf, Mm, and Ssa. 


Tidal Constituent 

Symbol 

Tidal Period 
(hour) 

Principal solar semidiurnal constituent 

S2 

12.000000 

Principal lunar semidiurnal constituent constituent 

M2 

12.420601 

Larger lunar elliptic semidiurnal constituent 

N2 

12.658348 

Lunar diurnal constituent 

Kl 

23.934470 

Solar diurnal constituent 

PI 

24.065890 

Lunar diurnal constituent 

Ol 

25.819342 

Lunisolar fortnightly constituent 

Mf 

327.85898 

Lunar monthly constituent 

Mm 

661.30927 

Solar semiannual constituent 

Ssa 

4382.9065 
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Fig. 1: latitude-longitude 1° x 1° grid over the study area without inshore sea regions 
covered by Jason-1 satellite altimetry data within —66° < 4 > < +66° and 0° < A < 
+360°. 


0° < A < +360°. The maximum degree and order of surface spherical har¬ 
monic functions, which are numerically independent within the study area, 
was determined using numerically independent columns of the Gram matrix 
G based on checking the Gram matrix determinant |G|, (See Appendix C for 
details). First, we computed Gram matrix G for normalised surface spherical 
harmonics up to maximum degree and order 25. Then we checked the Gram 
matrix determinant |G| for spherical harmonic functions at the different max¬ 
imum degrees and orders and found that the Gram matrix determinant |G| 
becomes zero at degrees and orders over than n = 20. Therefore, normalised 
surface spherical harmonic functions up to degree and order n max = 20 are 
selected as the numerically independent functions in our study. Figure 2 shows 
the value of condition numbers of the Gram matrix G for different maximum 
degrees of normalised spherical harmonic expansion up to maximum degree 
and order n max = 25. Since the intersection of the lunar orbital plane with 
the earth’s ecliptical plane, known as the nodal line, rotates once in every 18.6 
years, this is an issue that must be considered in the ocean tide modeling. In 
the case of Jason-1, the variations of the factors fk and Uk within a 10-day 
cycle are so small that the nodal modulations fk and Uk can be computed for 
the average time of each 10-day cycle. Table 6 shows the values of these cor¬ 
rections for lunar tidal constituents, namely, M2, N2, Kl, 01, Mf, and Mm 
which are estimated for cycle 198 on 2007.05.27, as an example. The sam¬ 
pling interval was selected according to the spatial wavelength of spherical 
harmonics expansion up to degree and order n max = 20. Using the formula 
2 x 6400 x 7r/n max , 2010.62 km was derived as the wavelength of the surface 
spherical harmonics of degree n max = 20. Therefore, we selected a sampling 
interval of 502.65 km, i.e., a quarter of the wavelength of the spherical har¬ 
monics for degree n max = 20. Considering the Jason-1 data spacing, which is 
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Fig. 2: The values of condition number of Gram matrix G for normalised surface 
spherical harmonic functions up to maximum degree and order n max = 25. 


Table 6: Amplitude and phase nodal corrections for lunar tidal constituents, namely, 
M2, N2, Kl, Ol, Mf, and Mm within cycle 198 of Jason-1 on 2007.05.27. 


Tidal Constituent 

fk 

Uk (deg) 

M2 

0.9651 

0.6636 

N2 

0.9651 

0.6636 

Kl 

1.1086 

2.4034 

Ol 

1.1757 

-2.7131 

Mf 

1.4328 

6.0899 

Mm 

0.8775 

0.0000 


every 5.8 km (with 1 Hz sampling rate), we selected a sample point every 45 
data points. This sampling rate could result in 255.78 km along track data 
spacing and selection of 2018856 data points from the Jason-1 satellite al¬ 
timetry observations within cycles 001 to 200, if all data were of good quality 
according to the Jason-1 data flags. Indeed we have used 37 flags defined on 
pages 25 and 26 of AVISO and PODAAC User Handbook [77] to find and use 
only those data among the selected 2018856 data points which are of good 
quality. Therefore, the observation points are selected with 255.78 km spacing 
along track according to the spatial wavelength of spherical harmonic expan¬ 
sion to degree and order n max = 20. Those data are used for the computation 
of the nine main tidal constituents, namely, S2, M2, N2, Kl, PI, Ol, Mf, Mm, 
and Ssa as well as MSL. Figure 3 shows the distribution of the observation 
points according to the specified sampling interval within cycle 190 including 
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Fig. 3: Satellite altimetry observations within cycle 190 of Jason-1 including 11061 
re-sampled data points with 255.78 km along-track spacing. 


11061 sea level data records. Using the Jason-1 satellite altimetry observations 
within cycles 001-200, the orthonormal base functions over the study area, 
i.e., sea regions covered by Jason-1 satellite altimetry data records, bounded 
by —66° < <j> < +66° and 0° < A < +360°, are computed and nine tidal con¬ 
stituents, namely, S2, M2, N2, Kl, PI, 01, Mf, Mm, and Ssa as well as MSL 
were modeled. Figure 4 shows a plot of the computed model for MSL. The 
ellipsoidal heights shown in Fig. 4 are with respect to the WGS84 reference 
ellipsoid. Figures 5 to 10 show the computed co-range maps of the six major 
diurnal and semidiurnal tidal constituents S2, M2, N2, Kl, PI, and 01. 



Fig. 4: Computed global MSL model with respect to WGS84 reference ellipsoid 
(contour intervals: 5 m). 
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o" 45° E 90° E 135° E 180° E 135° W 90° W 45° W 0° 

Fig. 5: Computed global co-range tidal model of S2 (contour intervals: 2.5 cm). 



Fig. 6: Computed global co-range tidal model of M2 (contour intervals: 5 cm). 



Fig. 7: Computed global co-range tidal model of N2 (contour intervals: 2.5 cm). 


















































































Empirical Global Ocean Tide and Mean Sea Level Modeling 199 



o" 45° E 90° E 135° E 180° E 135° W 90° W 45° W 0° 

Fig. 8: Computed global co-range tidal model of K1 (contour intervals: 2.5 cm). 



Fig. 9: Computed global co-range tidal model of PI (contour intervals: 2.5 cm). 



0° 45° E 90° E 135° E 180° E 135° W 90° W 45° W 0° 


Fig. 10: Computed global co-range tidal model of 01 (contour intervals: 2.5 cm). 
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4 Assessments 

In this section we are going to assess the Mean Sea Level (MSL), and six 
major diurnal and semidiurnal tidal constituents, namely, S2, M2, N2, Kl, 
PI, and 01 as well as three long term tidal components, i.e., Mf, Mm, and 
Ssa, that are empirically modeled at the global scale, in terms of orthonormal 
base functions over the sea areas covered by Jason-1 altimetry satellite within 
the six years of its operation. To achieve this and also in order to verify the 
accuracy of the computed models, the results of six tests are going to be 
presented. In all tests we did not consider coastal areas, where one would 
expect to have large errors due to inaccurate satellite altimetry observations. 

Test 1 is carried out by synthesising the sea surface observations within 
cycle 205 of Jason-1, that was not used in the modeling. Global distribution 
of those check points is shown in Fig. 11, which exactly corresponds to the 
ground tracks of Jason-1. The synthesised sea surface heights by the com¬ 
puted model within cycle 205 are presented in Fig. 12. The ellipsoidal heights 
shown in Fig. 12 are with respect to the WGS84 reference ellipsoid. Figure 
13 shows the difference between the sea surface observations within cycle 205 
and those synthesised by our computed MSL and tidal models using Eq. (2). 
Table 7 gives a statistical summary of the differences. The difference between 
the modeled sea surface heights and the observations is maximum 64.14 cm, 
with RMS 2.63 cm. The maximum difference shown in Table 7 may be caused 
by some short-term sea level variations. In fact, because we are comparing a 
tide model obtained via six years of satellite altimetry observations with the 
measurements within one cycle, such deviations can be considered quite justi¬ 
fiable. This test can be regarded as a combined verification of the amplitude, 
phase, and MSL models computed in this study. 

As test 2 and test 3, we compare the MSL computed in this study with (i) 
OSUMSS95 (Yi 1995 [78], Rapp and Yi 1997 [79]), and (ii) GSFC00.1 MSS 



Fig. 11: Global distribution of the used check points for Test 1, selected within 
cycle 205 of Jason-1 satellite, including 30341 re-sampled data points with 85.26 km 
along-track spacing. 
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Fig. 12: Synthesised sea level heights by the computed MSL and tide models within 
cycle 205 of Jason-1 with respect to WGS84 reference ellipsoid. 
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Fig. 13: Difference between the sea surface observations within cycle 205 and those 
synthesised by the computed MSL and tide models. 


Table 7: Statistical summary of the difference between observed sea level heights 
within cycle 205 of Jason-1 and those synthesised by the computed MSL and tide 


models. 

Total number of the check points 30341 

Minimum of differences (cm) —15.21 

Maximum of differences (cm) 64.14 

Mean of differences (cm) 0.35 

Minimum of absolute differences (cm) 0.00 
Maximum of absolute differences (cm) 64.14 
Mean of absolute differences (cm) 1.53 

RMS of differences (cm) 2.63 


(Wang 2001 [80]) Mean Sea Surface (MSS). The OSUMSS95 MSS is based on 
satellite altimetry sea level data, provided by one year of TOPEX/Poseidon, 
one year of ERS-1, one year of GEOSAT, and the first cycle of ERS-1 satellite 
altimetry mission. The values are given on a 3.75' x 3.75' grid within the geo¬ 
graphical latitude —80° < 4> < +82° globally. The OSUMSS95 MSL solution 
is the standard MSL model for TOPEX/Poseidon and is provided by JPL 
along with other TOPEX/Poseidon data. The details on the development of 
the OSUMSS95 MSL solution are given in Yi 1995 [78] where its comparisons 
with other solutions and its evaluations can be found. The GSFC00.1 MSS 
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derived by Wang 2001 [80] is computed based on satellite altimetry data from 
a variety of missions, including six years of TOPEX/Poseidon data, several 
years of ERS-1/2 data and GEOSAT data. The GSFC00.1 MSS is computed 
on a 2' x 2' grid over the sea areas bounded by the geographical latitude 
—80° < (f> < +80°. The check points used in these two tests are the points of 
the 1° x 1° grid over the study area shown in Fig. 1. The results of the two 
above comparisons are shown in Figs. 14 and 15 and Tables 8 and 9. The RMS 
values of the difference between the computed MSL and those of OSUMSS95 



Fig. 14: The difference between the computed global MSL model in this study and 
OSUMSS95 MSL solution. 



Fig. 15: The difference between the computed global MSL model in this study and 
GSFC00.1 MSL solution. 
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Table 8: Statistical summary of the difference between the computed global MSL 


model in this study and OSUMSS95 MSL solution. 

Total number of the check points 31521 

Minimum of differences (cm) —11.15 

Maximum of differences (cm) 13.53 

Mean of differences (cm) 0.28 

Minimum of absolute differences (cm) 0.00 
Maximum of absolute differences (cm) 13.53 
Mean of absolute differences (cm) 5.46 

RMS of differences (cm) 7.33 


Table 9: Statistical summary of the difference between the computed global MSL 


model in this study and GSFC00.1 MSL solution. 

Total number of the check points 31521 

Minimum of differences (cm) —9.86 

Maximum of differences (cm) 13.20 

Mean of differences (cm) 0.26 

Minimum of absolute differences (cm) 0.00 
Maximum of absolute differences (cm) 13.20 
Mean of absolute differences (cm) 5.31 

RMS of differences (cm) 7.13 


and GSFC00.1 MSL solutions are 7.33 cm, and 7.13 cm respectively, which 
are in good agreement with the RMS value obtained in test 1. 

As test 4 we are going to compare the amplitude of the computed models 
in this study for the six major tidal constituents, namely, S2, M2, N2, Kl, PI, 
and 01 with those computed by Goddard Space Flight Center (GSFC) via 
harmonic analysis of the tide gauge observations at 104 submerged (pelagic) 
tide gauge stations with the global distribution shown in Fig. 16. The data 
for this test was kindly supplied to the authors by Prof. Ray from GSFC. 
Among these submerged tide gauge stations, 98 stations are given with all six 
constituents mentioned above, and the remaining six tide gauges are provided 
with four tidal constituents, namely, S2, M2, Kl, and 01. The results of this 
comparison are shown in Figs. 17 to 22. The statistical summary of this test is 
presented in Table 10. All the RMS values of the difference between amplitudes 
of the tidal constituents are at the sub-centimetre level, which supports the 
already derived RMS values of test 1, test 2, and test 3. It is important to 
note that since we did not have any information about the phase of the tidal 
constituents in the tide gauge stations it has not been possible to present 
comparison of the phase lags. 

As test 5 we compare the computed amplitude models for the six domi¬ 
nant tidal constituents (S2, M2, N2, Kl, PI, and 01) in this study with those 
computed by TPXO.6.2 global ocean tide model [24]. As mentioned before, 
TPXO.6.2 global ocean tide model has been computed using inverse theory 
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Fig. 16: 104 submerged tide gauge stations where the amplitude of the six major 
tidal constituents S2, M2, N2, Kl, PI, and 01 of our model were compared with 
those computed by GSFC. 



Fig. 17: The differences between the amplitude of the S2 constituent computed in 
this study and those computed by GSFC at the pelagic tide gauge stations. 



Fig. 18: The differences between the amplitude of the M2 constituent computed in 
this study and those computed by GSFC at the pelagic tide gauge stations. 
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Fig. 19: The differences between the amplitude of the N2 constituent computed in 
this study and those computed by GSFC at the pelagic tide gauge stations. 



Fig. 20: The differences between the amplitude of the K1 constituent computed in 
this study and those computed by GSFC at the pelagic tide gauge stations. 



Fig. 21: The differences between the amplitude of the PI constituent computed in 
this study and those computed by GSFC at the pelagic tide gauge stations. 



Fig. 22: The differences between the amplitude of the Ol constituent computed in 
this study and those computed by GSFC at the pelagic tide gauge stations. 
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Table 10: Statistical summary of the differences between the amplitude of the tidal 
constituents S2, M2, N2, Kl, PI, and 01 computed in this study and those computed 
by GSFC via harmonic analysis of the tide gauge observations at the pelagic tide 
gauge stations. 

Tidal Minimum Maximum Mean Minimum Maximum Mean RMS 
Constituent Difference Difference Difference Absolute Absolute Absolute (cm) 
(cm) ( cm ) ( cm ) Difference Difference Difference 

(cm) (cm) (cm) 


S2 

-2.00 

1.71 

+0.06 

0.00 

2.00 

0.61 

0.77 

M2 

-5.33 

0.69 

-2.11 

0.10 

5.33 

2.13 

1.10 

N2 

-1.44 

1.50 

-0.13 

0.02 

1.50 

0.43 

0.52 

Kl 

-2.88 

0.80 

-0.56 

0.00 

2.88 

0.72 

0.75 

PI 

-1.91 

3.01 

+0.64 

0.01 

3.01 

0.89 

0.84 

Ol 

-2.38 

1.86 

-0.33 

0.00 

2.38 

0.63 

0.72 


via tide gauge and TOPEX/Poseidon data by finding an optimum balance 
between sea level observations and hydrodynamics theory and has been pre¬ 
sented on a grid of 0.5° x 0.5°. The maps and the statistical summary of the 
difference between the amplitudes of the tidal constituents S2, M2, N2, Kl, 
PI, and 01 computed in this study and those derived by TPX06.2 model, 
at 31521 check points over the 1° x 1° grid (shown in Fig. 1), within the 
geographical area bounded by the latitude —66° < (j) < +66° are given in 
Figs. 23 to 28 and in Table 11. The RMS values of the difference between the 
two models are less than three centimetres, which is in agreement with the 
results of previous tests. It should be noted that phase is not assessed by this 
test. 



Fig. 23: The difference of computed S2 amplitude model from that derived by use 
of TPXO.6.2 tide solution. 
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Fig. 24: The difference of computed M2 amplitude model from that derived by use 
of TPXO.6.2 tide solution. 



Fig. 25: The difference of computed N2 amplitude model from that derived by use 
of TPXO.6.2 tide solution. 


So far we have compared our MSL and tidal constituent models with that 
of other solutions and it has been found out that the overall difference of our 
solutions with the already existing ones is at the order of centimetre in terms 
of RMS. However, those tests cannot say how or if our models have improved 
the already existing knowledge about MSL and tidal models. This can only 
be achieved if, for example, we consider the tidal models derived from tidal 
observations at the tide gauge stations as a bench mark to test the satellite 
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Fig. 26: The difference of computed K1 amplitude model from that derived by use 
of TPXO.6.2 tide solution. 



Fig. 27: The difference of computed PI amplitude model from that derived by use 
of TPXO.6.2 tide solution. 


altimetry derived models. To achieve this we have included test 6. Indeed 
within this final test we are going to compare the capability of our model 
with TPXO.6.2 in the synthesis of the amplitude of the six aforementioned 
major tidal constituents at the 104 submerged tide gauge stations. For this 
purpose first we compare the computed six tidal constituents by TPXO.6.2 
with that of tide gauge stations. The results of this comparison are presented in 
Figs. 29 to 34 and Table 12. Comparing the RMS of the fit of our model to tidal 
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Fig. 28: The difference of computed 01 amplitude model from that derived by use 
of TPXO.6.2 tide solution. 


Table 11: Statistical summary of the difference between the amplitude of the tidal 
constituents S2, M2, N2, Kl, PI, and 01 computed in this study and those computed 
by TPX06.2 at the check points of the 1° x 1° grid. 

Tidal Minimum Maximum Mean Minimum Maximum Mean RMS 
Constituent Difference Difference Difference Absolute Absolute Absolute (cm) 
(cm) ( cm ) ( cm ) Difference Difference Difference 

(cm) (cm) (cm) 


S2 

-10.04 

10.72 

-0.05 

0.00 

10.72 

1.34 

1.86 

M2 

-11.75 

11.96 

-1.28 

0.00 

11.96 

2.27 

2.58 

N2 

-05.66 

07.42 

+0.04 

0.00 

07.42 

1.00 

1.40 

Kl 

-09.01 

10.70 

-0.49 

0.00 

10.70 

1.40 

1.87 

PI 

-06.28 

08.10 

+0.44 

0.00 

08.10 

1.21 

1.65 

01 

-08.31 

07.81 

-0.27 

0.00 

08.31 

1.15 

1.54 


constituents derived from tide gauge stations to that of TPXO.6.2, see Table 
10 and Table 12, the TPXO.6.2 is showing a slightly better fit. Naturally, since 
TPXO.6.2 global ocean tide model has been computed using inverse theory 
via tide gauge and TOPEX/Poseidon data by finding an optimum balance 
between sea level observations and hydrodynamics theory, it must reproduce 
the tide gauge observations better. However, as the results of Table 10 and 
Table 12 show, our model in spite of not having used any tide gauge data and 
assimilation of hydrodynamics theory is still following very well the accuracy 
of the TPXO.6.2 model. Of course, this test has been made over the submerged 
tide gauge stations and cannot be generalised to coastal areas. 
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Fig. 29: The differences between the amplitude of the S2 tidal constituent computed 
by TPXO.6.2 solution and those computed by GSFC at the 104 pelagic tide gauge 
stations. 



Fig. 30: The differences between the amplitude of the M2 tidal constituent computed 
by TPXO.6.2 solution and those computed by GSFC at the 104 pelagic tide gauge 
stations. 



Fig. 31: The differences between the amplitude of the N2 tidal constituent computed 
by TPXO.6.2 solution and those computed by GSFC at the 104 pelagic tide gauge 
stations. 
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Fig. 32: The differences between the amplitude of the K1 tidal constituent computed 
by TPXO.6.2 solution and those computed by GSFC at the 104 pelagic tide gauge 
stations. 



Fig. 33: The differences between the amplitude of the PI tidal constituent computed 
by TPXO.6.2 solution and those computed by GSFC at the 104 pelagic tide gauge 
stations. 



Fig. 34: The differences between the amplitude of the 01 tidal constituent computed 
by TPXO.6.2 solution and those computed by GSFC at the 104 pelagic tide gauge 
stations. 
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Table 12: Statistical summary of the differences between the amplitude of the tidal 
constituents S2, M2, N2, Kl, PI, and Ol computed by TPXO.6.2 and those com¬ 
puted by GSFC via harmonic analysis of the tide gauge observations at the pelagic 
tide gauge stations. 

Tidal Minimum Maximum Mean Minimum Maximum Mean RMS 

Constituent Difference Difference Difference Absolute Absolute Absolute (cm) 

(cm) ( cm ) ( cm ) Difference Difference Difference 

(cm) (cm) (cm) 


S2 

-2.23 

2.36 

+0.04 

0.01 

2.36 

0.68 

0.87 

M2 

-2.57 

2.23 

-0.23 

0.01 

2.57 

0.70 

0.90 

N2 

-1.50 

1.68 

-0.06 

0.00 

1.68 

0.24 

0.36 

Kl 

-2.90 

2.35 

+0.02 

0.00 

2.90 

0.45 

0.72 

PI 

-1.63 

1.22 

-0.12 

0.01 

1.63 

0.28 

0.39 

Ol 

-2.76 

2.31 

-0.09 

0.00 

2.76 

0.37 

0.63 


5 Discussions and Conclusions 

In this study we presented an alternative approach to global tidal modeling 
using satellite altimetry observations, with all theoretical details. The usual 
approach to tidal analysis using satellite altimetry data is to generate time 
series of sea level observations from repeated measurements along the satellite 
tracks. In the usual approach the generated time series are next subjected to 
spectral analysis for the computation of the tidal constituents and MSL. We 
refer to this approach as the point-wise tidal modeling scheme using satellite 
altimetry observations. Such ocean tide models are providing high resolution 
tidal information along satellite tracks. In contrast our contribution is provid¬ 
ing spatial tidal information, or a 4-D tidal model, i.e., a model of time and 
position in terms of orthonormalised spherical harmonics for the amplitude 
and phase of the tidal constituents. As compared to the point-wise approach 
the resolution of our model may seem to be lower, however it should be re¬ 
alised that with the point-wise approach the high-resolution tidal information 
is only available along the satellite tracks where the time series of the sea level 
observations are constructed. Our approach gives MSL and ocean tidal models 
with uniform resolution, which is indeed the maximum global resolution that 
can be derived from, e.g. Jason-1 satellite altimetry observations considering 
its cross-track spacing. Naturally, if tide gauge observations and/or combina¬ 
tion of other satellite altimetry data were used, the resolution of the solution 
could be further increased. It should be noted that here our intention has 
been to present a tool for global ocean tidal modeling using satellite altimetry 
data, which of course can be applied to any satellite altimetry data and/or 
tide gauge observations. In addition, it is also natural that if our method 
would be combined with hyclrodynamically derived tidal models it could also 
cover the area outside satellite altimetry data coverage, which has not been 
our intention in this study. 
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The comparisons between our tidal model and other solutions based on 
different approaches showed that we have succeeded in deriving a new empir¬ 
ical global tidal model, consisting of a constant part, together with harmonic 
parts of up to nine major tidal constituents, i.e., S2, M2, N2, Kl, PI, 01, 
Mf, Mm, and Ssa from six years of Jason-1 satellite altimetry data. According 
to the numerical tests, our tidal model has centimetre accuracy in estimating 
the sea surface height at the global scale. The constant part of the model, 
i.e. MSL, has proved to be in very good agreement with GSFCOO.l MSL so¬ 
lution [80] which is the standard JPL model for Jason-1. In addition, it is 
shown how Gram-Schmidt orthogonalisation procedure can be used to ob¬ 
tain a set of orthonormal base functions from surface spherical harmonics for 
global tidal modeling which can also be used for other oceanographic modeling 
applications. 

Acknowledgements. The University of Tehran is gratefully acknowledged 
for the financial support of this study via grant number No. 8151007/1/02. 
The authors gratefully acknowledge Jet Propulsion Laboratory (JPL) for sup¬ 
plying the six years of Jason-1 satellite altimetry data used in this study at 
http://podaac.jpl.nasa.gov. We also would like to appreciate Prof. Richard 
Ray for providing us with pelagic tide gauge data, computed at Goddard 
Space Flight Center (GSFC). The fruitful comments of the anonymous re¬ 
viewers of our paper are gratefully acknowledged. Besides, we would like to 
thank the editors for their comments and corrections which helped us to im¬ 
prove our contribution. 


Appendix A: Nodal Modulations 

Nodal corrections in the lunar tides, due to 18.6-year regression of the lunar 
nodal point, namely, factors fk and Uk can be computed for the lunar major 
tidal constituents, namely M2, N2, Kl, 01, Mf, and Mm using the formula 
provided by Doodson 1928 [72] given in Table A.l. Because these corrections 
are only applied for lunar tidal constituents, the factors fk and Uk can be 
defined for solar tidal components, namely S2, PI, and Ssa as fk = 1 and 
Uk = 0. In Table A.l, Q represents the mean longitude of the Moon’s ascending 
node, which can be computed by using Eq. (A.l) [72]. 

fi = 259.157 - 19.32818 (y - 1900) - 0.05295 (d + i) (A.l) 

where y is the year, and d is the number of days elapsed since January 1st in 
the year y. Integer i in Eq. (A.l) can be computed as i = [(y — 1901) /4], i.e., 
the integral part of (y — 1901) /4 , which is the number of leap years between 
the year 1900 and the year y, excluding y as the leap day in the year, which 
is counted in d [72]. 
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Table A.l: Nodal modulations of the lunar tides in terms of dominant lunar tidal 
constituents, namely, M2, N2, Kl, 01, Mf, and Mm due to the 18.6-year regression 
of lunar nodal point [72]. 


Tidal 

Constituent 

fk 

Uk (deg) 

M2 

1.0004- 0.0373 cos C+ 

0.0002 cos 217 

—2.14 sin 12 

N2 

1.0004- 0.0373 cos 12+ 

0.0002 cos 212 

—2.14 sin 12 

Kl 

1.0060 + 0.1150 cos 12- 

—8.86 sin 12 + 0.68 sin 212 

0.0088 cos 212 + 0.0006 cos 312 

—0.07 sin 312 

01 

1.0089+ 0.1871 cos 12- 

+ 10.80 sin 12 — 1.34 sin 212 

0.0147 cos 212 + 0.0014 cos 312 

+0.19 sin 312 

Mf 

1.0429 + 0.4135 cos 12- 

—23.74 sin 12 + 2.68 sin 212 

0.0040 cos 212 

—0.38 sin 312 

Mm 

1.0000 - 0.1300 cos 12+ 

+0.00 sin 12 + 0.00 sin 212 

0.0013 cos 212 

—0.00 sin 312 


Appendix B: Fully Normalised Associated Legendre 
Functions of the First Kind 

Here we define the fully normalised associated Legendre functions of the first 
kind P nm ( x ) for an arbitrary argument — 1 < x < 1 by means of the following 
relation [74]: 


P, 


(s) = _ 

■y/(2 — S m o)(2n + 1) 


(n—m)i 


(n+ra)! 2 n nl 


(i-ft 2 r 


dt n + r 


;(X 2 -1 Y 


(B.l) 


with 


6 


mO — 


1 , m = 0 
0, m/0 


(B.2) 


subject to Vn = 0,1,n max and m = 0,1,..., n. Normalised associated Leg¬ 
endre functions of the first kind P nm ( x ) are usually computed by recursive 
formulas as shown by Eqs. (B.3) and (B.4) (See for example [81] and [82]). 


(ft) = 


(2rt+l)(2rt—1) p / \ _ / 1,271+1 )(n+m-L) p / \ 

(n-m)(n+m) V (2n-3)(n+m)(n-m) 


(2n+l)(n+m— l)(n— m— 1) 
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for Vn m 
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-Poo ( x ) — 1 

(B.5) 
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< 

(B-6) 

(x) = \/3{l - x 2 ) 
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Appendix C: Gram-Schmidt Orthonormalisation 


Considering a finite or a countably infinite set of linearly independent func¬ 
tions {fi} = {/i, / 2 , fn} defined over inner product space Lp, then a set 
of orthonormal base functions {g,} = {ffi, <72, • ••, gn} over this inner product 
space can be obtained from the set of functions {/,} using Gram-Schmidt 
orthonormalising procedure as follows [76]: 

gi = hi/ ||/ii || L 2 
92 = h 2 / 11 h-211 Lg 
S3 = h 3 / H^llpg h 3 


hi = fi 

h2 = jg - (/2|Sl)lgSl 
/3 - </3|Sl)l = Sl - (/3|S2 )l2S2 


(C.l) 


n—1 

9n = hn/ ||^n||][2 h n = f n — (/ra 

i= 1 

where (•|-) L 2 denotes the inner product of two functions defined as integral 
over the domain D and 11-11^2 expresses the L2-norm of a function defined over 
the domain D. In general, one can write the above derivation as Eq. (C.2). 


9i = E c nfji * = 1,2,...,n (C.2) 

f=i 

where in the above expansion Cij are known as the “Combination Coeffi¬ 
cients” in the Gram-Schmidt orthonormalising procedure. One may write the 
Eq. (C.2) as follows: 

y = Cx (C.3) 

where x = (/ 1 , / 2 ,..., /„) T , y = (ffi, 92 , ■ ■ ■, 9n) T , and C is a lower triangular 
matrix containing the coefficients which can be defined as follows: 


C = 


Cn 0 0 ■ • • 0 

C21 C22 0 0 

C31 C32 C33 


Cnl Cn2 Gi3 ^nn 


(C.4) 


Practically, an efficient method for computing combination coefficients is 
based on the Cholesky decomposition of the Gram matrix G (fi,f 2 ,..., /„), 
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which for the set of linearly independent functions {/,;} = {/i, / 2 , • ••, fn} can 
be defined as follows [83]: 


G = 


</i|/i)Lg </i l/a)^ 
(/2|/l>Lg (/2 |/2)jL§ 


(/l|/n)l4 

(/2|/n)l2 


(C.5) 


(/n |/2)jLg (fn\fn)h*_ 

After deriving Gram matrix G, combination coefficients Cij can be readily 
computed by Eq. (C. 6 ) [ 66 ]: 


c = (Rr 1 ) 7, 


(C. 6 ) 


where R T is the lower triangular matrix derived from Cholesky Decomposi¬ 
tion of the Gram matrix G. Equation (C. 6 ) states that to find the combination 
coefficients c,; 7 -, all that is required is to decompose G in the Cholesky sense 
and find the inverse of the lower triangular matrix R T . It remains to evaluate 
the inner products {/)]./)) 14 to complete the construction of the orthonor¬ 
mal functions. The inner products of the base functions fi and fj shown in 
Eq. (C.5) can be defined within inner product space as follows: 


(/i 1/7)14 = ^ f J fifjda 

D 


(C.7) 


where da denotes surface differential element, and an represents the total sea 
areas of the study domain D, which can be derived as follows: 


an — 



(C. 8 ) 


For computations of the above double integrals in Eqs. (C.7), and (C. 8 ), geo¬ 
metric boundaries of the domain D have to be known. The above inner prod¬ 
ucts can be computed over a finite element approach within the study domain 
as the summation of the inner products over the cells. Similar to vectors in 
linear algebra, the elements of a space of continuous functions have the prob¬ 
lem of linear dependence [75]. Therefore one of the most important issues to 
be addressed when dealing with Gram-Schmidt orthonormalising procedure is 
to find the number of linearly independent functions {fi}, defined over inner 
product space Ljj, to construct the orthonormal base functions {(?, }. Select¬ 
ing a number of functions {/, } more than the number of linearly independent 
functions over the study domain D, can lead to numerical instability in the 
computations and even singularity. One way to check the numerically depen¬ 
dence of the functions {/)} is by using the Gram matrix determinant |G|. If 
the Gram matrix determinant |G| of the given set of functions {/,,} becomes 
zero, then not all of the given functions are numerically linearly independent. 
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Abstract. Analysis of tides and internal waves from model studies in the South 
China Sea is done using three techniques. We summarize results from standard 
Fourier methods, continuous wavelet analysis and the direct scattering transform. 
Because the Fourier and wavelet analysis are inherently linear methods their utility 
in application to nonlinear dynamics is often questioned. Nevertheless, they have 
shown to be useful in delineating first order dynamics (for example finding funda¬ 
mental modes). On the other hand the scattering transform, sometimes described 
as a ‘nonlinear Fourier’ technique, can in some cases succeed in elucidating non¬ 
linear dynamics where linear methods have proven less successful. We apply these 
procedures to model results from Lamb’s 2D non-hydrostatic model applied to the 
South China Sea and in some cases the multi-component tides used to force the 
Lamb model. 

Keywords: Discrete fourier transform, Continuous wavelet transform, Direct 
scattering transform, Luyon strait, Internal gravity waves 


1 Introduction 

It is widely accepted that the first recorded internal wave was that described 
by J. Scott Russel. The correct mathematical framework for the phenomena 
came later with Korteweg and de Vries and their description of the KdV solu¬ 
tions to the one dimensional problem (see [1]) for a brief account of the early 
history of internal waves). Oceanic internal waves arise because of the natu¬ 
rally occurring stratification of the ocean’s water column. As a result, internal 
waves arise throughout the earth’s oceans. Well known examples include the 
internal waves observed in the Strait of Gibraltar and in the Sulu Sea. The 
University of Delaware maintains a website (http://atlas.cms.udel.edu/) con¬ 
taining an exhaustive catalogue of internal wave images gathered by satellite. 
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Internal gravity waves (IW) occur as a result of tidal flow over steep topog¬ 
raphy, for example, coastal shelves and deep water sills. As the tide flows over 
the topography the thermocline is depressed resulting in the generation of a 
bore. The bore propagates and its leading edge steepens through nonlinear 
effects. Thereafter, the bore degenerates into solitary waves through frequency 
and amplitude dispersion [1, 2]. 

Dispersive effects become increasingly evident as the IWs propagate, caus¬ 
ing the amplitudes and number of oscillations to vary over time. In this sense 
IWs are non-stationary, that is their spatial and temporal scales change as 
the IWs develop. 

Dispersion of IWs is commonly summarized in amplitude, wavelength, and 
velocity relations. For example the amplitude of a solitary wave depression 
can be plotted against its width (or half-width) over a range of propagation 
distances [3]. The amplitudes and widths are often obtained by inspection. 
While useful (and widely used) there remains some subjectivity involved in 
determining the participant values. 

Objective analyses exist to investigate non-stationary processes. They 
include statistical methods such as principle component analysis and time- 
frequency analysis including Fourier techniques, wavelets and multiscale anal¬ 
ysis. Recent studies have employed these tools to study a variety of problems 
(see for example [4]). 

This paper describes in some detail the application of three techniques to 
modeling results for IWs generated in the Strait of Luzon and propagating 
into the South China Sea. They are (1) the discrete Fourier transform (DFT), 
the direct scattering transform (DST), and (3) the wavelet transform (WT). 
Furthermore, analysis of tidal data used in driving the IW model is included 
for comparison. 

A good deal of interest exists concerning the generation and propagation 
of internal gravity waves in the South China Sea. As part of the Asian Seas 
Acoustics Experiment (ASIAEX), field measurements (encompassing a variety 
of platforms) took place in 2001 in South China Sea to quantify acoustic 
volume interaction during presence of solitary waves. Analysis of the field 
data showed the presence of solitary waves with amplitudes up to 160 m, and 
phase speeds of .83 m/s to 1.6 m/s [5]. The recent 2005 and 2006 Windy 
Island Experiment [6] measured amplitudes of up to 250 m and phase speeds 
up to 3.4 m/s. Recent modeling studies predict the occurrence of solitary 
waves consistent with those observed. The internal waves (IW) appear to be 
generated by deep water sills in the Luzon Strait. The IWs travel across the 
South China Sea towards the coast of China, their structure evolving as they 
propagate (see Fig. 1). 

In the following the model predictions are discussed in Sect. 2. The anal¬ 
ysis methods are briefly described in Sect. 3 in the following order, the DFT 
in Sect. 3.1, the DST in Sect. 3.2, and the WT in Sect. 3.3. Analysis of re¬ 
sults are then discussed in Sect. 4. A concluding summary is contained in 
Sect. 5. 
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Fig. 1: Taiwan is located at the uppermost edge of the image. The Luzon Strait spans 
the region running south towards the Philippines (not shown) at the lower boundary. 
Internal waves can be seen in the lower left quadrant propagating westward (from 
University of Delaware, Center for Remote Sensing, url: atlas.cms.udel.edu). 
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2 Model Predictions 

We have undertaken a model study of the area using the 2D ocean model 
developed by Kevin Lamb [2]. The model is initialized using analytic fits which 
approximate real density and bathymetry data. Internal waves are generated 
by tidal forcing from the Navy Coastal Prediction Model (NCOM) tidal model 
[7]. The results are discussed in the following paragraphs. 

Figure 2 shows results from Lamb’s 2D ocean model after 70 hours of sim¬ 
ulation time. The density field is shown with several isopycnal lines spanning 
the domain of the upper 1000 m of ocean near the modeled sill of the Luzon 
Strait (the grey patch near the leftmost edge of the figure). Because the IWs 
described here begin as a tidal bore (a sharp depression of the pycnocline) and 
evolve into a group of solitary waves they can be identified throughout the 
domain as IW ‘packets’. Three IW packets are easily noted located at ranges 
running east to west at —250 km, —550 km, and lastly near —700 km. The 
IWs are propagating toward a shelf located on the Chinese coast. 

As the IWs propagate it is apparent that the nature of the IW packet is 
qualitatively changing over time. The IW at —250 km is tightly packed with 
numerous oscillations, at —550 km the oscillations have separated with large 



Fig. 2: Isopycnals (22.5-27, sigma-t units) are shown within the internal wave field. 
Results are from Lamb’s 2D ocean model for the Luzon Strait. 
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amplitude oscillations at the leading edge of the IW packet, and still further 
at —700 km the separation continues, however the amplitude and number of 
oscillations has noticeably diminished. 

The goal of the analysis in this paper is to compare the results from three 
techniques for quantifying the evolution of the internal waves. Each provides 
a different and complementary view of IW behavior. 


3 Methods 

In this section the techniques used for analysis are described. The DFT and 
WT are briefly summarized along with a more detailed development of the 
direct scattering transform. The descriptions here are provided as a point of 
reference for the discussion that follows. More detailed information can be 
found in the references. 

3.1 Discrete Fourier Transform 

An estimate of the energy or power at a particular Fourier frequency or wave¬ 
length characterizing a sequence is sought. First note that the squared value 
of a sequence integrated over time is a measure of energy. In this case we have 
the following expression for the energy, E, of a sequence x(t) measured over 
time t with a period, L, 

E = f x(t) 2 dt. (1) 

Jo 

The amount of power, W, in the sequence over the period is therefore given 
by the following equation, 




( 2 ) 


The ideas above are implemented in a straight forward way by the pe- 
riodogram. The discrete version of the periodogram, P xx , can be written as 
follows [8], 


_ IM (/)| 2 

fsL 


( 3 ) 


where 


L-l 

X l{J) = ^2x L [n)exp(-2TTjfn/f s ), 


n —0 


( 4 ) 


is the discrete Fourier transform of the sequence x(t) and f s is the sampling 
frequency. 

As will be seen in the following sections the DFT is more successfully 
applied to problems that are linear than to nonlinear problems like internal 
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waves. This leads one to speculate that perhaps a ‘nonlinear’ Fourier approach 
would prove more successful. The direct scattering transform is sometimes 
thought of as a nonlinear Fourier method in the sense that it uncovers con¬ 
stituent components in nonlinear problems. 


3.2 Direct Scattering Transform 


Next, we turn to the relationship between the (periodic, inverse) scattering 
transform and the (ordinary) Fourier transform, the interpretation of the for¬ 
mer as a nonlinear generalization of the latter, and the algorithm for comput¬ 
ing the DST spectrum of a data set. To this end, we begin by formulating the 
Korteweg-de Vries (KdV) equation, which describes the dynamics of weakly- 
nonlinear dispersive waves, for the internal-waves problem. 

Under the assumption that the internal solitary waves are ‘long’ and that 
they are traveling in a ‘shallow’ layer (this will be made more precise be¬ 
low), the governing (KdV) equation of the pycnocline displacement, which we 
denote by rj(x,t), is 


Vt + coVx + oivnx + Pr/xxx = 0, 0 <x<L, t > 0, (5) 


where L(> 0) is the spatial period (i.e., the length of the domain), and the sub¬ 
scripts denote partial differentiation with respect to an independent variable. 
In addition, co(> 0), a(< 0), and /3(> 0) are (constant) physical parameters 
(see, e.g., Apel [1] for their interpretation). The simplest way to evaluate them 
is to assume a two-layer (density) stratification [9]. Then we have 
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where hi and h 2 {> hi) are the distances from the unperturbed pycnocline to 
the free surface and to the ocean bottom, respectively, while gi and g 2 (> pi) 
are the fluid densities in the top and bottom layers, respectively. Furthermore, 
we are interested in the periodic initial-value problem. That is to say, given a 
data set r](x, t = 0) such that r](x + L, 0) = p(x , 0) for 0 < x < L , we wish to 
determine its evolution rj(x, t) for t > 0. 

The strategy for solving the periodic KdV equation by the scattering trans¬ 
form can be split into two distinct steps: the direct problem and the inverse 
problem. The former, which is termed the direct scattering transform (DST), 
consists of solving the Schrodinger eigenvalue problem 


{-dxx - Kr)(x,0)}i/j = Sip, (7) 

where k = a/(6P) is a nonlinearity-to-dispersion ratio, ip is an eigenfunction, 
and £ is a (real) spectral eigenvalue such that \PE is a (complex) wavenumber. 
For periodic signals, as we have assumed, it is well-known that the eigenvalues 
fall into two distinct sets [10]: the main spectrum , which we write as the set 
{£n}n= o, an d the auxiliary spectrum , which we write as the set {/t^I^Tq 1 , 
where N is the number of degrees of freedom (i.e., nonlinear normal modes). 
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On the other hand, the inverse problem consists of constructing the nonlin¬ 
ear Fourier series from the spectrum {£„} U {/t° } using Abelian hyperelliptic 
functions [10] or the Riemann 0-function [11]. In former case, which is the 
so-called /^-representation of the scattering transform, the exact solution of 
(5), subject to periodic boundary conditions, takes the form 


r]{x,t) 


N—l 


2 N 


2 ^ ] // n (x, t) ^ ] S n 


n —0 


n —0 


( 8 ) 


It is important to note that all nonlinear waves and their nonlinear interac¬ 
tions are accounted for in this linear superposition. Unfortunately, the compu¬ 
tation of the nonlinear normal modes (i.e., the hyperelliptic functions /i ra (a;, t ), 
0 < n < TV — 1) is highly nontrivial; however, numerical approaches have been 
developed [12] and successfully used in practice [13, 14]. In addition, we note 
that the auxiliary spectrum, often referred to as the hyperelliptic function 
‘phases,’ is such that /t° = /t n (0,0) [10, 14]. 

Several special cases of (8) offer insight into why the latter is analo¬ 
gous to the (ordinary) Fourier series. In the small-amplitude limit, i.e., when 
max^ t |/t„(a;,i)| <C 1, we have n n (x,t) ~ cos(:r — u n t + where u> n is 
a frequency and <p n a phase. Therefore, if we suppose that all the non¬ 
linear normal modes fall in the small-amplitude limit, then (8) reduces to 
the ordinary Fourier series. This relationship is more than just an analogy, 
Osborne and Bergamasco [15] give a rigorous derivation of the (ordinary) 
Fourier transform from the scattering transform. Next, if there are no inter¬ 
actions, e.g., the spectrum consists of a single wave (i.e., N = 1), we have 
yLo(x,t ) = cn 2 (x — loq t + <(> o | too ), which is a Jacobian elliptic function with 
modulus mo- In fact, it is the well-known cnoidal wave solution of the peri¬ 
odic KdV equation [1]. 

For the hyperelliptic representation of the nonlinear Fourier series, given 
by (8), the wavenumbers are commensurable with those of the ordinary Fourier 
series, i.e., k n = 27r(n + l)/i (0 < n < N— 1) [10, 13, 14]. However, this is not 
the only way to classify the nonlinear normal modes. One can use the ‘elliptic 
modulus’ (or, simply, modulus) m n , termed the ‘soliton index,’ of each of the 
hyperelliptic functions, which can be computed from the discrete spectrum as 


m n 


&2n+2 £-2n+l 

&2n+2 &2n 


0 < n < N - 1. 


(9) 


Then, each nonlinear normal modes falls into one of three distinct categories 
based on its soliton index: 

1. m„ > 0.99 => solitons, in particular, cn 2 (x|m = 1) = sech 2 (a;); 

2. m n > 0.5 => nonlinearly interacting cnoidal waves (e.g., moderate- 
amplitude Stokes waves); 

3. m n <C 1.0 => radiation, in particular, cn 2 (a^|m = 0) = cos 2 (;r). 

Furthermore, it can be shown [13, 14] that the amplitudes of the hyperel¬ 
liptic functions are given by 



230 


J.A. Hawkins et al. 



I (£ref - ^ 2 n+i), for solitons; 
j^(£2n+2 ~ ^2n+i), otherwise (radiation); 


K ' 
2 K 


( 10 ) 


where £ re f = £ 2™*+2 is the soliton reference level with n* being the largest n 
for which m n > 0.99. Then, clearly, the number of solitons in the spectrum is 
N so \ = n*. 

To summarize: the DST consists of computing the amplitudes and degrees 
of nonlinearity (moduli) of the nonlinear normal modes. Furthermore, if the 
KdV equation governs (at least to a good approximation) the evolution of the 
data set, then the DST spectrum characterizes the dynamics for all time. If 
that is not case, then the DST provides an instantaneous projection of the 
dynamics onto the solution space of the periodic KdV equation, giving us a 
nonlinear characterization of the data set at a particular instant of time. 

In addition, the DST has been successfully employed in the Fourier-like 
decomposition of data from inherently nonlinear physical phenomena such as 
shallow-water ocean surface waves [13], laboratory-generated surface waves 
[14], and internal gravity waves in a stratified fluid [16]. Also, we note that 
the numerical implementation of the DST used in this paper is a modified 
version of Osborne’s automatic algorithm [10], as described in [17]. 

Finally, we quantify the assumption of ‘long, shallow-water’ waves made 
above, so that the limits of the DST’s applicability are clear. The latter as¬ 
sumption amounts to requiring that the largest wave amplitude (denoted 
by ry max = max^t \r/(x,t)\) is much smaller than the top layer’s depth, i.e., 
r)max//*i <C 1, and that the characteristic width of the waves is much greater 
than the top layer’s depth, i.e., hi/A <C 1, where A can be taken to be, e.g., 
the largest half-width of the waves [1, 9]. Also, we may compute the (spatial) 
Ursell number of a data set, which is defined [14] as 



( 11 ) 


This gives an additional measure of the ‘nonlinearity’ of a wave train, with 
Ur ~ 1 being the limit of linear theory. 

3.3 Wavelet Transform 

While characterizing the scale of internal waves is important, it is equally 
important to know how that scale changes over time. In this regard the wavelet 
transform proves particularly useful. Here we discuss the application of the 
continuous wavelet transform and leave aside other multiscale analysis which 
can be useful in analyzing IW [18]. The general development of the continuous 
wavelet transform is well described in the literature [19, 20]. 

The wavelet transform W g (s,x) of a spatial sequence f(x) can be defined 
as follows, 



(12) 
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where the wavelets g sx ' ( x ) are generated from the shifted and scaled versions 
of the mother wavelet g{x ), 

9sx'(x) = -=g 

Vs 

where s and x' are real values that scale and shift the wavelet, respectively. 
Note that the wavelet transform is in fact a convolution of the wavelet g with 
the sequence f(x). 

In the work described here the continuous wavelet transform is used with 
the mother wavelet chosen to be the Morlet wavelet. This provides two advan¬ 
tages. First, as noted above the WT is a convolution of the wavelet with the 
sequence to be analyzed. Thus, the WT can be implemented using the convo¬ 
lution property of the Fourier transform, that is, convolution in space becomes 
a product of transforms in Fourier space. This property is employed in the 
algorithm described by Torrence and Compo [21] which is used here. In this 
formulation the discrete wavelet transform is the inverse Fourier transform of 
the following product, 



N -1 

w n (s) = ^2 fj9*(skj) exp (ikjndx), (14) 

3=0 

where / and g are the Fourier transforms of the sequence and wavelet, re¬ 
spectively. The second advantage the Morlet wavelet affords is that there is 
an explicit relationship between the wavelet scale s of a sequence and the 
standard Fourier components. This allows a direct comparison between the 
familiar DFT Fourier components and those obtained via the wavelet trans¬ 
form. The power of a wavelet component W n is given by the amplitude squared 


4 Analysis 

The analytic methods just described can be applied to both linear and nonlin¬ 
ear problems. We discuss application to linear problems using tide data and 
to nonlinear problems using internal waves. 

Results from analysis of the data here can be grouped into two broad 
regimes. Characteristic scales can be discerned over illustrative segments of 
data which are short compared to the complete data set. Other patterns can 
only be made out if relatively long sequences are examined. Hence, in the 
following the analysis is divided into short and long data sequences. First, we 
investigate short segments of data in Sect. 4.1 and then longer data segments 
in Sect. 4.2. 
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4.1 Analysis: Short Data Sequences 

In this section we look at the frequency and spatial characteristics of short 
data segments using the DFT, DST, and WT. 

4.1.1 DFT 

For analysis, we use several datasets. First, we will look at DFT analysis of 
tidal velocities. This data is linear and is a good example of the strength of 
the DFT. We will then analyze a segment of the internal wave shown in Fig. 2 
using the DFT, the DST, and the wavelet transform. Finally, we will analyze 
a longer segment of the IW field using the windowed DFT and the WT and 
compare the results. 

First consider the tidal velocity over time (days) and its Fourier spectrum 
shown in Fig. 3. The upper panel shows tidal velocity taken from the NCOM 
tides model [7] sampled at roughly an hour (59 mins) over about 50 days. 
Qualitatively, we see many high frequency oscillations on the order of a day 
and a long (14 day) component modulating the entire time period. The lower 
panel is the power spectrum of the sequence. Note the large amplitudes near 
0.04 and 0.08 (h _1 ), these components correspond to 12.4 and 24 hr tidal 
components as expected. The long (14 day) modulation component is the so- 
called fortnight effect known to exist in this tide and can be seen very near 
the left edge of the plot. 

The DFT results for the tidal velocity clearly show the tide’s component 
parts and are a good example of the utility of the DFT. In this case, tidal 
velocity, the dynamics are very nearly linear and hence it is a good candidate 
for analysis with the DFT. 

We now consider a segment of the internal wave field from Fig. 2. In the 
upper panel of Fig. 4 the segment shows the oscillating displacement of a 
single isopycnal (25.1 in sigma units) which is near a depth of —150 m when 
undisturbed and includes 8 distinguishable troughs, the largest of which at 
—530 km reaching nearly —400 m. The segment is restricted to include only 
the internal wave ‘packet’ spanning a range between —550 km and —350 km. 
Here and in the coming discussion we will repeatedly examine this internal 
wave segment by a number of techniques. The lower panel shows the Fourier 
spectrum for the IW. It can be seen by inspection that the separation of the 
troughs of the IW in the upper panel are on the order of 25 km. The DFT 
spectrum shows the tides’s Fourier components unevenly spread over a range 
near 25 km. This moderate spectral resolution giving the power in a range of 
components rather than clear peaks presents a limitation in the application 
of the DFT to IWs. The Welch spectrum is overlayed on the periodogram for 
comparison. While smoother, it nevertheless suffers the same problem with 
resolution. 
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Fig. 3: Upper panel shows average tidal velocity in the Luzon Strait. The lower panel 
shows the Fourier components for the tide obtained from the DFT. 


4.1.2 DST 

Because of the physical basis of the DST, it only makes sense to apply it to 
nonlinear wave phenomena that are governed (at least to a good approxima¬ 
tion) by the KdV equation. Therefore, we only consider the application of the 
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Fig. 4: Upper panel shows short segment of IW field taken from Fig. 1. Lower panel 
shows Fourier components calculated from the DFT. The periodogram (solid curve ) 
and the Welch spectrum (dashed curve) are shown for comparison. Note that 25 km 
component ‘disappears’ as a result of smoothing the Welch spectrum. 
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Direct scattering transform spectrum of the middle IW packet 



Wavenumber (m' 1 ) x ICf 3 


Fig. 5: Scattering transform spectrum of the middle wave packet (range —375 km 
to —545 km). 


DST to the internal wave segment. The spectrum of the middle wave packet 
(recall Fig. 2) is shown in Fig. 5. 

The DST finds 28 solitons in the data set traveling on a ‘reference level’ of 
—158.8 m; the Ursell number is 3.209. Physically, the reference level (shown 
as a black, dashed horizontal line in Fig. 5 and those that follow below) can 
be understood as the location of the undisturbed isopycnal in the absence of 
anything but non-interacting (well-separated) solitons. All this means is that 
the amplitudes of the soliton nonlinear oscillation modes are measured with 
respect to this reference level. 

What is interesting about the DST spectrum is that it not only immedi¬ 
ately captures the six solitary waves visible in the data set but also finds a 
number of ‘hidden’ modes that cannot be found by observation. Moreover, 
the DST spectrum reveals that the visible solitary waves fall into two dis¬ 
tinct groups — the leftmost three waves and the one near —405 km form one 
group, while the ones near —445 and —425 km are part of another group. 
We can make this distinction because of the trends in the amplitude versus 
wavenumber plot of the spectrum given in Fig. 5. In other words, we see that 
the first four amplitudes’ absolute values decrease essentially linearly with the 
wavenumber, and the slope of the line connecting them is about that of the 
line which connects the the crests of the leftmost three waves (and the one 
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near —405 km if it is ‘moved’ to be next to the latter ones). However, after 
the fourth mode in the spectrum, the trend of the nonlinear oscillations’ am¬ 
plitudes changes abruptly, which signifies a break in the pattern, and the rest 
of the modes cannot be grouped with the first four. 

It may be surprising that there are 28 solitons in the spectrum of this 
wave packet, thus, one must keep in mind that these internal waves are highly 
nonlinear structures, while the KdV equation, which is the basis of the DST, 
governs the weakly-nonlinear limit. Therefore, we cannot say with certainty 
that there are precisely 28 solitons present in the data. However, we con, 
with a high degree of certainty, conclude that there are ‘hidden’ solitons and 
that solitons represent the energetic part of the spectrum (i.e., moderately 
nonlinear waves and radiation are hardly present, if at all). 

4.1.3 WT 

Here we will consider the wavelet transform of the IW segment previously 
discussed shown in the upper panel of Fig. 4 (the WT of tidal data will be 
considered in a later section). The results are shown in Fig. 6. Note that the x- 
axis duplicates that shown with the data sequence (between —550 and —350). 
In this sense the spectrum power is co-located near it’s associated IW. The 
y -axis shows the Fourier wavelength associated with the wavelet scale. The 
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Fig. 6: Wavelet spectrum for internal wave segment. Darker colors represent greater 
wavelet power. 
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spectrum shows components with significant energy spread over the region 
between —460 km and —540 km peaking near —510 km at a wavelength of 
near 30 km. The location of these components indicates that the primary char¬ 
acteristic lengths associated with the first few depressions are about 30 km. 

Note small increased areas of wavelet power near the ranges —530 km, 
—500 km, —470 km. These components are associated with individual troughs 
of the IW. This can be understood by recalling that the wavelet transform is 
a convolution of the wavelet with the waveform being analyzed. These small 
peaks come about when the probing wavelet becomes situated inside the IW 
troughs. As a consequence, the wavelet scale is on the order of the width of 
the troughs of IW. This feature is not observed with the DFT. 

The great advantage that the continuous wavelet transform enjoys is the 
ability to isolate the characteristic scales of IW. While the resolution is not 
to the extent that we have seen in the Fourier analysis of the tidal data 
(Fig. 3), nevertheless, the WT is able to locate the characteristic scales of 
IWs. Moreover, the WT localizes these scales in space. This feature holds the 
possibility that the scale of the IW can be tracked over time. This phenomenon 
is investigated more closely in the following sections. 

4.2 Analysis: Long Data Sequences 

As previously noted, internal waves are nonstationary in the sense that their 
characteristic spatial and temporal scales evolve over time. At its inception 
the IW packet is a single depression (or bore) in the isopycnal, which, upon 
propagation developes into a series of solitary waves through nonlinear disper¬ 
sion. These solitons grow in amplitude and separate, effectively lengthening 
the packet. The fully developed IW packet analyzed in the above sections is 
of this type. Further propagation leads to an IW packet whose constituent 
solitons has diminished in both number and amplitude. Thus, the three IW 
packets observed in Fig. 2 can be thought of as snapshots of a single IW packet 
over its lifetime. This pattern is repeated to varying degrees in most naturally 
observed IWs. The evolutionary aspect of IW dynamics is a good example of 
a nonstationary system. For this reason, it is instructive to investigate long 
data sequences to elucidate this behavior. 

In the following section, the windowed discrete Fourier transform is used 
to investigate tidal data. Subsequently, the long data sequence of the IW field 
is analyzed with the WT and and the DST. 

4.2.1 Windowed Discrete Fourier Transform 

We noted earlier that the DFT does not discern variations in time in the sense 
that the components discovered via the DFT occurred throughout the tidal 
time sequence. Some resolution in time can be gained by repeatedly applying 
the DFT within a short window which is ‘slid’ along the waveform being 
analyzed. This is the idea behind the windowed Fourier transform (WFT). 
Fig. 7a shows the results of applying the WFT to tidal data obtained from 


238 


J.A. Hawkins et al. 



(a) Windowed Fourier transform of tidal data. 
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(b) Wavelet transform of tidal data. 

Fig. 7: Tidal data analysis results: (a) windowed Fourier transform (b) wavelet 
transform. Darker colors indicate greater relative power. 
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NCOM and used to drive the Lamb model to generate internal waves. The 
x-axis is time and ranges over about 50 days and the y -axis is the usual 
Fourier frequency (h _1 ). With the spectrogram we can see variation over time 
of the frequency components of the tide. There are at least two peaks in the 
spectrum, one at 0.04 and another at 0.08 (h _1 ) associated with 12.4 and 24 
hr tidal period respectively. In analysis of the tidal data the WFT yields good 
time-frequency information. 


4.2.2 Wavelet Transform 

The wavelet transform can be considered a refinement of the WFT. Recall 
that the mother wavelet is scaled and shifted along the waveform to be tested 
yielding the wavelet spectrum. In this sense the WFT represents a crude 
wavelet which is a square wave that can be scaled and shifted along the wave¬ 
form, the resulting spectrum varying with both time and frequency. Noting 
this similarity it is not surprising that, we expect the wavelet transform to 
yield results similar to those of the spectrogram. 

Figure 7b shows the wavelet transform for the tidal data previously an¬ 
alyzed. Again we can make out the diurnal and semi-diurnal components of 
the tide. Qualitatively, the results are almost identical to those found using 
the WFT (excepting that the WFT returns the reciprocal of the period). 

Lastly we consider a series of three internal wave packets and analyze the 
result with wavelets. In Fig. 8 the upper panel shows the series of IWs and 
the lower panel the associated wavelet spectrum. The results show the gen¬ 
eration and evolution of the internal wave packets as they propagate toward 
the leftmost edge of the domain. 

The general features we saw previously (Fig. 6) are repeated for each of 
the packets (the middle packet being the one previously described). The peaks 
in the spectrum most closely associated with the leading edge of each of the 
IW packets, (—670 km, —500 km, —225 km) correspond to the characteristic 
wavelengths of the individual IW packets. The length increases from about 
10 km for the first packet to about 30 km for the middle packet and roughly 
35 km as the packet reaches the left boundary. The increase reflects the gradual 
increase in distance between troughs within each packet. 

Referring to the peaks in the wavelet spectrum allows us to draw attention 
to the wavelet component with the maximum intensity. However the peaks are 
surrounded by areas of high (relative to the background) intensity reflecting 
the fact that the spectrum is spread across wavelengths and ranges. In Fig. 8 
the concentration of spectral intensity ‘spreads’ with time so that we see the 
intensity of the spectrum for the IW packet at —225 km is well concentrated 
in range and wavelength, at —500 km the intensity measurably broadens and 
finally the intensity of the packet at —670 km is quite diffuse. The cause of 
this general dissipation could be from attenuation of the internal wave packet 
through either physical or numerical mechanisms. 
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Fig. 8: Wavelet transform spectrum for long time internal wave sequence. 


Finally note the peaks associated with individual troughs within each of 
the three packets where the wavelets ‘fit’ just inside the individual troughs. 
The characteristic wavelength for these troughs does not appear to change 
significantly over the propagation distance. This indicates the general shape 
of the troughs is somewhat constant throughout the domain. 
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4.2.3 Direct Scattering Transform 

Because the DST identifies the KdV-based nonlinear normal modes of the data 
set and their evolution, it would only make sense to perform a DST analysis of 
the entire isopycnal if it were governed by the KdV equation. Clearly, that is 
not the case as the solitary waves can ‘age’. Therefore, in this subsection, we 
perform a ‘windowed’ scattering transform analysis of the full data set. That 
is to say, we take three snapshots of the evolution of the internal solitary waves 
and compute the DST spectrum of each. This approach is similar to that of 
Zimmerman and Haarlemmer [16], who computed the DST spectrum of their 
data at different times in order to identify the nonlinear normal modes that 
are invariants of the motion (i.e., those that do not change in time). 

To this end, in the top panel of Fig. 9, we show the DST analysis of 
the leftmost (farthest away from the sill) wave packet of the isopycnal under 
consideration. The middle wave packet, which was the subject of Sect. 3.2, 
is given in Fig. 5. And, the rightmost (closest to the sill) wave packet’s DST 
spectrum is shown in the bottom panel of Fig. 9. For the leftmost packet, the 
DST finds 27 solitons traveling on a reference level of —151.3 m; the Ursell 
number of the data set is 2.008. On the other hand, for the rightmost wave 


Direct scattering transform spectrum of the leftmost IW packet 



Direct scattering transform spectrum of the rightmost IW packet 



Fig. 9: Top and bottom panels display the scattering transform spectra of the leftmost 
(range —150 km to —368 km) and rightmost (range —547 km to —750 km) wave 
packets of the isopycnal under consideration. 
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packet, the DST finds 52 solitons traveling on a reference level of —163.9; the 
Ursell number is 5.718. 

The first thing we notice is the similarity in the trend of the nonlinear nor¬ 
mal modes’ amplitudes (i.e., the rate of increase/decrease of the amplitudes 
with the wavenumber) in each snapshot. As was the case for the middle inter¬ 
nal wave packet we discussed earlier, the largest amplitudes, which decrease 
quickly (in absolute value) with the wavenumber, are easily seen to be those 
of the solitary waves visible in the data set. Then, there is a large number 
of ‘hidden’ modes whose amplitudes’ absolute values decrease approximately 
linearly with the wavenumber. Furthermore, the spectrum of each wave packet 
is clearly dominated by solitons, as the amplitudes of the nonlinear normal 
modes with moduli m n <0.99 are very small (in absolute value) in compar¬ 
ison with the soliton modes. Again, we emphasize that we cannot be certain 
whether there are precisely 52 or 28 solitons in the respective internal wave 
packets. Nonetheless, the DST provides concrete evidence of the nonlinear 
and evolving nature of the packets. Moreover, there is no doubt that solitons 
are the most prominent part of the spectra, and that their number decreases 
as the internal wave packets propagate away from the sill. 

Furthermore, though for the first wave packet (see top panel of Fig. 9) the 
non-soliton normal modes are mostly radiation, as their moduli are m n <C 1, 
for the middle and rightmost wave packets (see Figs. 5 and bottom panel of 9) 
we observe more nonlinear normal modes to the right of the ‘soliton cutoff’ of 
approximately 1.3 x 10 -3 m _1 . This correlates with the fact that the Ursell 
number of this wave packet is the largest of the three — almost twice that of 
the middle packet and three times that of the leftmost packet. Moreover, this 
result is consistent with the fact that farther away from the sill the internal 
waves are, the closer their dynamics are to the KdV (and, eventually, linear) 
ones. 


5 Summary 

Data and model studies of internal gravity waves show that their generation 
and evolution is accompanied by changes in their characteristic spatial and 
temporal scales. This nonstationary, dispersive behavior arises from nonlinear 
elements in IW dynamics. In IW studies, dispersion is commonly summarized 
in amplitude, wavelength, and velocity relationships. Often these are con¬ 
structed by inspection. In the work described here, objective, analytic tools 
are employed to investigate the non-stationary behavior of IWs. 

In this paper, three methods have been applied to internal wave data 
generated by Lamb’s [2] model designed to simulate IWs observed in the Luzon 
Strait and South China Sea. They are the following: (1) the discrete Fourier 
transform, (2) the direct scattering transform and (3) the wavelet transform. 
The analysis has been applied to linear tide data, to ‘short’ internal wave data 
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and to ’long time’ segments. Each method yields positive results which in some 
cases are complementary (DFT, WT) and in some cases unique (DST). 

The DST allows for a truly nonlinear analysis of the internal waves and 
provides a measure of the applicability of the Korteweg-de Vries equation. 
While the DST does not necessarily give precise quantitative results that can 
be used for predictive purposes, it provides a ‘genuinely nonlinear’ decomposi¬ 
tion of the data set. In particular, the DST spectra of the snapshots of a wave 
packet at different stages of its evolution allow us to see the ‘nonlinear mode 
conversions’ taking place over time and provides an understanding of solitary 
wave ‘aging’ in terms of these modes (notice the decrease in the number of 
nonlinear oscillation modes for wavenumbers between 0 and ss 5 x 10 -4 ). 

The discrete Fourier transform, the windowed Fourier transform and the 
wavelet transform res present a continuum of Fourier based approaches to 
investigate IWs. Each yields the Fourier components of the internal waves 
and further, the WT (thought of in terms of a refined WFT) provides a 
view of how these modes change over time. In this regard, we have seen that 
the WT elucidates the evolving character of internal waves thus providing 
a time/frequency picture of the evolving dynamics of the internal wave over 
long periods. 

In summary, each technique can be seen to provide positive recognizable 
details of IW dynamics. It is not uncommon to find that what is obvious to 
the naked eye cannot be verified by reasonable examination. Thus, while the 
results described in this paper fall short of a complete quantitative description 
of IW dynamics, nevertheless, that these methods support and expand on 
what can be seen ‘by eye’ is a nontrivial result. Certainly, the entire catalogue 
of analyses applicable to the investigation of internal waves has not been 
addressed here. However those described here represent a span of means by 
which to investigate internal wave dynamics. Moreover, it is reasonable to 
expect that further work will yield more quantitative results. 

Acknowledgements. This research was supported by the Office of Naval 
Research under PE 62435N, with technical management provided by the Naval 
Research Laboratory. I.C. acknowledges a fellowship from the ONR/ASSE 
Naval Research Enterprise Intern Program. 


References 

1. J. R. Apel: 2003. A new analytical model for internal solitons in the ocean. 
J. Phys. Oceanogr. 33, 2247-2269. 

2. K. G. Lamb: 1994. Numerical experiments of internal wave generation by strong 
tidal flow across a finite amplitude bank edge. J. Geophys. Res. 99(C1), 843-864. 

3. A. C. Warn-Varnas, S. A. Chin-Bing, D. B. King, Z. Hallock, and J. A. Hawkins: 
2003. Ocean-acoustic solitary wave studies and predictions. Surveys in Geo¬ 
physics. 24 39-79. 


244 J.A. Hawkins et al. 


4. A. Grinsted, J. Moore, and S. Jevrejeva: 2004. Application of the cross wavelet 
transform and wavelet coherence to geophysical time series, Nonlinear Processes 
in Geophysics 11, 561-566. 

5. T. F. Duda, J. F. Lynch, J. D. Irish, R. C. Beardsley, S. R. Ramp, and 
C.-S. Chiu: 2004. Internal tide and nonlinear wave behavior at the Continental 
Slope in the North China Sea. IEEE J. Ocean Eng. 29, 1105-1130. 

6. S. R. Ramp, 2006. Private communication. 

7. S.-Y. Chao, D.-S. Ko, R.-C. Lien, and P.-T. Shaw: 2007. Assessing the West 
Ridge of Luzon Strait as an internal wave mediator. J. Oceanogr. 63 (No.6), 
897-911. 

8. Signal Processing Toolbox User’s Guide for use with MATLAB. The Math- 
Works, Inc. (2002). 

9. A.R. Osborne, T.L. Burch: 1980. Internal solitons in the Andaman Sea. Science 
208, 451-460. 

10. A.R. Osborne: 1994. Automatic algorithm for the numerical inverse scatter¬ 
ing transform of the Korteweg-de Vries equation. Math. Comput. Simul. 37, 
431-450. 

11. A.R. Osborne, M. Serio, L. Bergamasco, and L. Cavaleri: 1998. Solitons, cnoidal 
waves and nonlinear interactions in shallow-water ocean surface waves. Physica 
D 123, 64-81. 

12. A.R. Osborne and E. Segre: 1991. Numerical solutions of the Korteweg-de Vries 
equation using the periodic scattering transform /^-representation. Physica D 44, 
575-604. 

13. A.R. Osborne, E. Segre, G. Boffetta, and L. Calaveri: 1991. Soliton basis states 
in shallow-water ocean surface waves. Phys. Rev. Lett. 67, 592-595. 

14. A.R. Osborne and M. Petti: 1994. Laboratory-generated, shallow-water surface 
waves: analysis using the periodic, inverse scattering transform. Phys. Fluids 6, 
1727-1744. 

15. A.R. Osborne and L. Bergamasco: The solitons of Zabusky and Kruskal re¬ 
visited: perspective in terms of the periodic spectral transform. Physica D 18, 
26-46. 

16. W.B. Zimmerman and G.W. Haarlemmer: 1999. Internal gravity waves: analysis 
using the the periodic, inverse scattering transform. Nonlin. Process. Geophys. 
6, 11-26. 

17. I. Christov: 2008. Internal solitary waves in the ocean: analysis using the peri¬ 
odic, inverse scattering transform. Math. Comput. Simul., arXiv:0708.3421, (in 
press). 

18. S. Jevrejeva, J. C. Moore, and A. Grinsted: 2003. Influence of the artic oscillation 
and El Nino-Southern Oscillation (ENSO) on ice condition in the Baltic Sea: 
The wavelet approach. J. Geophys. Res. 108, D21, 4677-4688. 

19. P. Kumar and E. Foufoula-Geogiou: 1994. Wavelet analysis in geophysics: an 
introduction. Wavelets in Geophysics, E. Foufoula-Georgiou and P. Kumar, eds. 
Academic Press, San Diego, pp. 1-45. 

20. L.H. Kantha and C.A. Clayson: 2000. Appendix B: Wavelet Transforms. Numer¬ 
ical Models of Oceans and Oceanic Processes. L.H. Kantha and C.A. Clayson, 
eds. Academic Press, San Diego, pp. 786-818. 

21. C. Torrence, G.P. Compo: 1998. A practical guide to wavelet analysis. Bull. Am. 
Met. Soc. 79, 61-78. 


Crustal Deformation Models 
and Time-Frequency Analysis of GPS 
Data from Deception Island Volcano 
(South Shetland Islands, Antarctica) 


Marla Eva Ramirez, Manuel Berrocoso, Marla Jose Gonzalez, and 
Alberto Fernandez 

Laboratorio de Astronomla, Geodesia y Cartografla. Departamento de 
Matematicas. Facultad de Ciencias. Universidad de Cadiz, 
mariaeva.ramirezOuca.es; manuel.berrocoso@uca.es; 
majose.gonzalez@uca.es; alberto.fernandez@uca.es 

Abstract. We have applied wavelet techniques to analyze GPS time-series 
data from REGID geodetic network, deployed at Deception Island Volcano 
(South Shetland Islands, Antarctica). In the present analysis wavelets are 
used to detect periodic components and to filter the data. The high frequency 
components can be associated to the orbital period of the satellites and to local 
tidal effects, whereas the medium frequencies seem to be related to the weather 
cycle. The wavelet filtering procedure is based on the SURE estimator, and a 
considerable reduction in noise is achieved, particularly in the Up component, 
whose deviation is reduced down to the deviation of the horizontal components 
before the denoising. An estimation of the displacements in the network for 
the period 2001/02 - 2005/06 is also included. 


Keywords: South Shetland Islands, Volcano monitoring, Continuous wavelet 
transform, Wavelet denoising 


1 Introduction and Motivation 

The Global Positioning System (GPS) is widely used to study many geoscience 
problems such as the determination of the motion of the Earth’s tectonic plates 
or volcanic monitoring. 

Most of the chapters describing GPS data analysis for evaluation of crustal 
deformation estimate a single station position over a 24 h period. This pro¬ 
cessing strategy is suitable when deformation rates are small and vary slowly 
along the years but a sub-daily position is required for some other applications 
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such as volcanic monitoring. In fact, volcanic activity is usually associated to 
significant ground deformation, what makes GPS to be considered an ideal 
technique for both monitoring and studying of active volcanoes. 

The usual approach in GPS data processing is limited in the sense that the 
coordinates of the stations are estimated only once per day, while a much 
higher solution rate is obviously needed if a rapidly deformation event must 
be detected. Moreover, the 24 h sampling rate ignores any variation within a 
day as well as any periodic component which affects the data with a period 
shorter than one day. 

In this chapter we present a new methodology to analyze GPS time series 
when a high sampling rate is considered in the data processing. With this 
sampling, three objectives appear: 

To detect the periodic components in the data that are ignored with pro¬ 
cessing sessions of 24 h. 

To filter the data in order to decrease the scattering and to remove the 

detected periodicities. 

To evaluate the displacement in the area of study from the surveyed sta¬ 
tions, in order to better understand the pattern of its behaviour. 

To tackle these questions we have considered the time-frequency decomposi¬ 
tion of the data that the wavelet transformation provides [1]. 

The analyzed GPS data correspond to the surveying of the geodetic network 
deployed on Deception Island Volcano (South Shetland Islands, Antarctica) 
during the last Antarctic campaigns (2003-04 - 2005-06) for monitoring its 
volcanic activity. 

Geodetic studies in Deception Island (Fig. 1) began in the 1950s by the 
Chilean, Argentinean and British scientists from the bases on the island. These 
works were focused on updating the existing cartography of the island and 
they were interrupted at the end of the 60s, when the volcanic eruptions that 
took place forced the evacuation of the bases. The geodetic and geophysical 
tasks were interrupted until 1986, when monitoring of the island was reestab¬ 
lished by Argentinean and Spanish researchers. In January 1992, a noticeable 
increase in seismic activity was detected, with 900 registered events and 4 
felt earthquakes. Gravity and magnetic anomalies suggested that the volcano 
reactivation was due to a 2 km depth magmatic injection at Fumaroles Bay 
[2]. These evidences started to subside in February 1992. Regarding previ¬ 
ous geodetic work, Berrocoso [3] conducted repeated GPS surveying from 
1989-90 to 1995-96, obtaining an absolute deformation rate of 4 cm/year and 
3.24 cm/year at BARG and FUMA stations respectively, and a value of 2.91 
cm/year and 0.89 cm/year at BALL and PEND stations. A subduction process 
was observed around the island, with values of 1.94 cm/year for BARG and 
FUMA and 0.94 cm/year and 1.74 cm/year for PEND and BALL respectively 
(Fig. 2). 

At the end of 1998 only a few events were recorded, but this behaviour changed 
suddenly in the beginning of 1999, with the occurrence of significant seismo- 
volcanic activity. This crisis included volcano-tectonic (VT) and long period 
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(LP) events together with volcanic tremors. Most of the registered events were 
localized between Fumaroles and Telephone Bay, some of which were large 
enough (3-4) to be felt. When the campaign finished, the seismic activity was 
still high [4]. 

Although GPS data are available from 1989, in this paper we focused just on 
those ones collected during the last Antarctic campaigns, due to the improve¬ 
ment achieved in the storage capabilities of the receivers. In particular, we will 
concentrate on the time-frequency analysis of the GPS time series more than 
on the interpretation of the detected deformation. Further information about 
the displacement rate and the deformation on the island and in the South Shet¬ 
land Islands environment can be found in [5, 6, 7, 8]. In these works a detailed 
description of the geodetic works on Deception Island can be found and the 
horizontal deformation models up to the 2002/2003 campaign are presented. 
The models were obtained by considering 24 h sessions in the processing of 
the GPS observations but no further analysis of the data was developed. The 
processing strategy applied in this work, which considers 30 min sessions, al¬ 
lows not only the estimation of the deformation models but also the in depth 
study of the time series, what constitutes the innovative aspect of this study. 
The chapter is organized as follows: the first section includes a brief description 
of the tectonical setting of the South Shetland Archipelago. The following 
section deals with the GPS surveying campaigns and the processing strategy. 
Next section includes the methodology for analyzing the data, where wavelets 
are applied for two different purposes: to detect the periodic components and 
to filter the data. The next section includes the application of the explained 
methodology to the GPS time series. The detected periodicities are discussed 
and the denoising of the data is described in detail. The last part of the 
section addresses the estimation of the deformation occurring on the studied 
area. The deformation pattern estimated from the surveyed stations is shown, 
and a comparison of the results obtained from the filtered and non-filtered 
data is exposed. A final discussion and a brief overview of future work follow 
in the last section of the chapter. 


2 Tectonical Setting 

Deception Island, located north-west of the Antarctic Peninsula, is situated 
in the Bransfield Strait marginal basin that separates the South Shetland 
Islands from the Antarctic Peninsula. It is a horseshoe shaped stratovol- 
cano, whose main volcano-tectonic feature is a central flooded depression 
related to a spreading centre in the Bransfield Strait. This central caldera 
has been traditionally described as a collapse caldera, originated after one or 
more voluminous eruptions [9]. However, other models for Deception Volcano 
suggested that it was formed progressively by passive normal faulting along 
nearly orthogonal normal faults that cut across the island according to a re¬ 
gional trend [10]. The Bransfield Strait is a consequence of the rifting and 
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separation of the South Shetland Archipelago and the Antarctic Peninsula, 
that generated a chain of volcanoes along the Bransfield Sea, three of which 
emerged: Deception, Pinguin and Bridgeman [2, 11]. The complex structure 
of this environment is characterized by the interaction of four small tectonic 
plates (the Scotia Arc, the Atlantic and Pacific Plates, and the Phoenix Plate) 
which cause a subduction process from the South Shetland Islands Plate to the 
Phoenix Plate and an expansion and thermal ascent between the South Shet¬ 
land Island Plate and the Atlantic Plate along the Bransfield Sea (Fig. 1). 
Deception Island is also located at the confluence of two major tectonic 
structures: the SW end of the Bransfield and a southerly extrapolation of 
the Hero Fracture Zone. In addition to these interesting tectonic features, 
Deception Island has exhibited a continuous volcanic activity with confirmed 
eruptions in 1800, 1812, 1842, 1871, 1912, 1956, 1967, 1969, and 1970. The 
last eruptive process took place from 1967 to 1970 around Telephone Bay and 
Mont Pond (Fig. 2), along the main fracture in the NNE-SSW direction. It 
gave rise to a 40 m high cone and an alignment of five craters in the North 
of the island, causing the collapse of the Chilean Base in Pendulum Cove and 
the destruction of the British Base in Whaler’s Bay due to a lahar action. 



-62’ -60' -56’ 



Fig. 1: Tectonic and geographical setting of Deception Island Volcano. 
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Fig. 2: Distribution of the stations of the geodetic network in Deception Island 
Volcano. Stations marked with a square are monitored through the whole campaigns, 
whereas the others are surveyed during some days every year. The locations of the 
last eruptive process are also indicated in the map. 


Nowadays, the main superficial evidences of the volcanic activity on the island 
are the presence of fumarolic areas with 100° C and 70° C gaseous emissions 
at Fumaroles and Whaler’s Bay respectively, 100°C hot soils at Hot Hill, 
and 45°C and 65°C thermal springs in Pendulum Cove and Whaler’s Bay 
[12, 13]. A remarkable seismic activity is also registered in some areas of the 
island, whith a mean of 1000 events detected during the campaigns. The high 
seismicity reflects a rift expansion, a subduction process and volcanism. 

Due to the complex geodynamic characteristics of Deception Island and of 
the South Shetland environment previously described, the volcanic activity 
on the island is related to regional tectonics. 


3 Data Description and Processing 

The analyzed GPS data come from the GPS surveys of the geodetic network in 
Deception Island Volcano, REGID. At present, the GPS network consists of 12 
stations distributed around the inner bay of the island as it is shown in Fig. 2. 
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The network has been episodically surveyed every Austral summer since 1989. 
Since the quality of the GPS data improved in the last years, in this study we 
have just focused on the data from the last campaigns (2003-04, 2004-05, and 
2005-06), with better quality and a larger number of observations. In fact, 
for the time-frequency analysis we have just considered data from FUMA 
and PEND stations, at Fumaroles Bay and Pendulum Cove. These stations, 
together with BEGC station at the Spanish Antarctic Base Gabriel de Castilla, 
provide a global approach to the deformation occurring on the island and 
they constitute the fundamental stations of the network, being surveyed along 
the whole campaigns. Table 1 shows the duration of the 2003-04, 2004-05, 
and 2005-06 campaigns and the number of days each station of the REGID 
geodetic network was surveyed. 

GPS data were processed with BERNESEv4.2 Scientific Software [14], accord¬ 
ing to the following scheme: 

1. Firstly, the absolute coordinates for the reference station BEGC of the 
REGID geodetic network were obtained from its processing together with 
BEJC and the IGS station OHI2, at the Chilean Base O’Higgins (150 km 
away); 

2. Coordinates for the rest of the stations of the REGID geodetic network 
were calculated through radial baselines, setting the fixed station at BEGC 
as the reference station. 

Regarding the configuration parameters of the processing, tropospheric pa¬ 
rameters and ambiguities resolution were calculated using the GPSEST sub¬ 
routine of BERNESE v4.2 software. Tropospheric parameters were estimated 
hourly from the Saastamoinen tropospheric model as suggested by [15] for 


Table 1: Duration of the 2003-04, 2004-05 and 2005-06 campaigns and number of 
days each station is surveyed. 


2003-04 

campaign: 12/01/2003-01/01/2004 




Stations (number of surveying days) 




BEGC 

(31) BARG (6) FUMA 

(31) 

PEND 

(30) 

BALL 

(3) COLA (6) GEOD 

(9) 

UCA1 

(3) 

TELE 

(3) BOMB (5) CR70 

(3) 

GLAN 

(1) 

2004-05 

campaign: 12/02/2004-02/02/2005 




Stations (number of surveying days) 




BEGC 

(71) BARG (8) FUMA 

(37) 

PEND 

(40) 

BALL 

(0) COLA (0) GEOD 

(9) 

UCA1 

(0) 

TELE 

(3) BOMB (5) CR70 

(3) 

GLAN 

(0) 

2005-06 

campaign: 12/17/2005-02/26/2006 




Stations (number of surveying days) 




BEGC 

(72) BARG (7) FUMA 

(71) 

PEND 

(71) 

BALL 

(5) COLA (5) GEOD 

(4) 

UCA1 

(6) 

TELE 

(6) BOMB (5) CR70 

(2) 

GLAN 

(5) 
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GPS data processing in Antarctic regions; ambiguities were solved by apply¬ 
ing the quasi ionosphere free (QIF) strategy. Ambiguities for both LI and 
L2 frequencies were solved after fixing the reference station and estimating 
ionospheric parameters for every epoch. An iono free solution for every ses¬ 
sion was obtained. Sessions were 24 h length for obtaining the coordinates of 
the reference station, and 30 min length for the estimation of the rest of the 
REGID stations coordinates. The solutions of every session were adjusted by 
means of the ADDNEQ routine for obtaining the coordinates of the reference 
station, whereas the solutions for the rest of the stations of the network were 
not adjusted in order to get a larger number of data in the time series to be 
analyzed. Precise orbits and pole files were used for the entire procedure. 
Once data are processed, the set of the resulting geocentric coordinates 
(X, Y, Z) are transformed into a local topocentric system (E, N, U), with cen¬ 
ter in each surveyed station. Therefore, three components are obtained for 
each station: the East and North horizontal components, and the vertical (or 
Up) component, being the latter less reliable due to the well-known loss of 
accuracy of the GPS system in the elevation component (Fig. 3). 


East Component 

0.15 
0.1 
0.05 
0 

-0.05 
- 0.1 

02/12/03 12/12/03 23/12/03 31/12/03 02/12/03 12/12/03 23/12/03 31/12/03 

Fig. 3: East and Up components for FUMA station at Fumaroles Bay correspon¬ 
ding to 2003/04 campaign, obtained after the processing of the data with a 30 min 
sampling rate. 



Up Component 



4 Methodology 

Geophysical time series are usually generated by processes so complex that 
their behaviour is difficult to model in the time domain. For many years, 
Fourier analysis has been the essential tool to study certain characteristics of 
data in the frequency domain. Nevertheless, these frequency components are 
not well time located. Moreover, they are infinitely disperse in time. The Win¬ 
dowed Fourier Transform allows a restricted partition of the time-frequency 
plane, considering boxes or windows of fixed shape, but this non-varying shape 
of the boxes restricts the time location of the frequency content in the data. 
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The wavelet transform presents an alternative to classical frequency analy¬ 
sis, since the decomposition bases considered for its computation provide a 
partition of the time-frequency plane with boxes of varying shape, depending 
on the frequency resolution we are interested in. This time-frequency trans¬ 
formation allows the decomposition of the data along successive scales, which 
are related to different frequency resolutions. Therefore, the small scales in¬ 
volve the analysis of the high frequency components of the data, and they 
correspond to boxes of small time support and wide frequency support in the 
partition of the time-frequency plane, whereas the greater scales correspond 
to the lower frequencies, related to boxes with a wide time support and a 
small support in the frequency domain (Fig. 4). During the last years, several 
works can be found where wavelet techniques are used in the analysis of GPS 
data. In [16], wavelet decomposition is used to better estimate the secular 
trend, by rejecting those wavelet coefficients related to the high frequencies, 
and keeping a smoothed version of the original data. Other studies [17] focus 
on the filtering of the data to reduce the multipath effect, and also [18] on the 
detection of sudden changes or occurrence of events in the data. 

In this work the wavelet transform was applied to the GPS time series in 
order to (1) detect the periodic components and (2) to denoise the data to 
reduce their scattering. In particular, the periodic components of the data will 
be associated to the scales that concentrate the maximum level of energy in 
the time-frequency decomposition. Concerning the filtering of the data, the 
usual wavelet denoising techniques are based on the application of a threshold 
onto the wavelet coefficients. Those ones that are above the threshold are kept 
whereas the others are set to zero, and an estimation of the denoised signal 
is obtained from the filtered coefficients. Nevertheless, this strategy needs to 
be slightly modified when the data are too noisy. In fact, a high contribution 
of noise to the signal makes the total energy of the data to be too spread all 
over the wavelet coefficients, which do not exceed the threshold and yields a 
too smoothed denoised signal. We have considered the Stein’s Unbiased Risk 
Estimator for the filtering [19], with the value of the threshold depending not 
only on the wavelet coefficients but also on a comparison of the signal energy 
with an estimation of the energy of the noise. 



Fig. 4: Decomposition of the time-frequency plane with the Euclidean (a) and Fourier 
(b) basis; with the windowed Fourier basis (c), and by means of wavelet basis (d). 
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4.1 Wavelet Background 

The wavelet theory relies on the existence of two functions if ( mother wavelet 
or wavelet function ) and tp ( father wavelet or scale function) verifying certain 
properties [1], which provide a decomposition of the data in the time-frequency 
plane along successive scales. This time-frequency transformation depends on 
two parameters, the scale parameter s, which is related to the frequency, and 
the time parameter u, related to the translation of both functions if and tp in 
the time domain. 

The continuous wavelet transform of a function / £ T 2 (K) is defined as the 
inner product of / with the translated and dilated version of if, 

Wf(u , s) = (/, ip StU ) = J f(t)-^=ip* dt > (!) 

where if* denotes the conjugate of if. The expresion (1) can be seen as the 
convolution f -k if s (u) of the function / with if s , where 

= T.*' ( t ) ' (2) 

Varying u in time, this convolution provides the frequency components {d S}U } u 
of the signal associated with the scale s and the time location it, or what is 
designated as the details coefficients of / at the scale s and time u. A real 
wavelet transform preserves energy as long as the wavelet function if satisfies 
a weak admissibility condition given by Calderon in [20]. 

On the other hand, the convolution of / with (p s , where tp s is defined from tp 
in an analogous way to if s , 

f*<Ps=j f(t)-~<p* dt= {f, (3) 

provides a smoothed version of /, given by the approximation coefficients of 
f at scale s , 

The time-frequency resolution of the wavelet decomposition depends on the 
time-frequency spread of the considered wavelet function if. In fact, since the 
time and frequency support are inversely related, as the scale parameter s de¬ 
creases, the shorter periods (and high frequencies) are captured by the wavelet 
transform. Conversely, the greater the scale parameter s is, the longer the pe¬ 
riods and lower frequencies detected. Nevertheless, this resolution or location 
in the time-frequency plane is lower bounded by the Heisenberg’s Uncertainty 
Principle [1], which states that the perfect location in both domains simulta¬ 
neously is not possible. 

Particularly important are the dyadic versions of both functions if and tp, 
where the scale s is a power of 2, that is, s = 2- J , j £ Z, and their integer 
translations u = k £ Z. This importance lies in the fact that the set {ifj,k}j, ez, 
where ifj^ is given by 
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(4) 


, , , 1 , ft-2ik\ 

Mt) = ^ (—) , 

constitutes an orthonormal base of the space L 2 (R) of functions of finite en¬ 
ergy. Thus, any function / £ L 2 (R) can be expressed in terms of {'*/'j,fc}(j,fc)ez 2 
in a non redundant way, 

fit) = ^2'^2{f,^j,k)^j,k{ t )- (5) 

The term is defined as the wavelet coefficient dj^, so the above ex- 

presion (5) can be rewritten as 

( 6 ) 

jez fc£z 

Under these considerations, each j-approximation can be seen as the sum of 
the (j + l)-approximation of a coarser level and the (j + l)-details which 
appear in the scale j but disappear in the approximation at scale j + 1. 

The decomposition determined by the dyadic wavelet can be understood as 
a discretization of the continuous wavelet transform given by (1). Mallat’s 
algorithm [1] calculates the wavelet coefficients with a cascade of discrete 
convolutions and subsamplings, what is specially useful to save computation 
efforts and for certain wavelet applications such as data compression or de¬ 
noting. Hence, the choice of which of these transformations (continuous or 
discrete) is better to use depends on the purpose of the data analysis. 

The representation of the wavelet coefficients as a function of scale and time 
location yields the wavelet spectrum, providing a decomposition of the data 
in the time-frequency plane. Since every horizontal line in the wavelet spec¬ 
trum is associated to a frequency component in the data, this decomposition 
constitutes an useful tool to detect dominant periodic components. Repre¬ 
senting the scales of the decomposition versus the corresponding energy of 
the coefficients, that is, the scalegram of the data, the periodic components 
are identified with the scales whose associated energy reach the maxima in 
this representation. 

In order to validate the detection of periodicities in the data from the max¬ 
ima of the wavelet scalegram, we have applied the same procedure onto two 
synthetic sinusoidal signals 


2 /i = cos turf , . 

2/2 = cos Wit + sinw 2 f ' 

■where u>\ = 27r/864000 and w 2 = 27r/86400. The time span was set to [0 : 
1800 : 1800 • 1340] in order to cover a total of 27 days, and the sampling 
period was set to A = 1800 s to get the same number of data and the same 
sampling rate as some of the experimental samples. 
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Fig. 5: Synthetic signal y\ = sinaqt, where wi = 2 * 7r/Ti, with T\ = 864000 (a); 
wavelet spectrum (b) and scalegram (c), whose maximum value is reached for the 
scale s = 377 that captures the oscillation of the sinusoidal signal. 


With these considerations, the wavelet spectrum and the scalegram were cal¬ 
culated. For the first synthetic signal (Fig. 5a) the scalegram (Fig. 5c) reveals 
that the maximum value is reached at the scale s = 377, which corresponds to 
a frequency of f s = 9.824 • 10 -7 , that is, a period of T ~ 11.7 days, according 
to the scale-frequency relation given by 


fs 


F c 

a-A ’ 


( 8 ) 


where f s denotes the characteristic frequency for scale s, F c is the central fre¬ 
quency of the wavelet, related to its maximum oscillation, and A is the sam¬ 
pling period. For the second synthetic signal j /2 = cos ui\t + sinu^i (Fig- 6a) 
it is observed how the wavelet spectrum captures both oscillations (Fig. 6b), 
which correspond to the local maxima in the scalegram (Fig. 6c). The local 
maxima are reached at scales Si = 38 and S 2 = 377, which are related to the 



Fig. 6: Synthetic signal y2 = sinwif + cosu> 2 t, where ui = 2 * n/Ti and u >2 = 
2 *pi/T 2 , with T\ = 864000 and T 2 = 86400, respectivey (a); wavelet spectrum (b) 
and scalegram (c), whose local maxima are reached for the scales s = 38 and s = 377 
that capture the oscillations of the sinusoidal signal. 
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frequencies f Sl = 9.746 • 10 6 and f S2 = 9.824 • 10 7 , and therefore to periods 
Ti ~ 1.1 and T 2 ~ 11.7 days, respectively. 


4.2 Wavelet Denoising 


In practice, experimental data are usually affected by different sources of noise 
that mask the signal of interest. The experimental data X n can be written as 


X n = f n + W n , 


(9) 


where /„ is the pure magnitude and W n is a Gaussian noise with <r 2 variance. 
Most of the wavelet denoising techniques are based on the comparison of the 
amplitude of wavelet coefficients with a threshold T previously defined. Those 
coefficients whose amplitudes are above the value T are kept or smoothed, 
while those ones whose amplitudes are below T are set to zero. Thus, the 
filtered signal / is estimated by the reconstruction or synthesis from the fil¬ 
tered wavelet coefficients. Since each wavelet coefficient is associated to a 
time-frequency location, this denoising is adapted to the local regularity of 
the data. This fact constitutes the main advantage of the wavelet procedure 
over the usual band pass filtering techniques, where the suppression of certain 
frequency band affects all the time domain of the signal. 

The first question that arises in wavelet-based denoising is the choice of a 
value for the threshold T. Donoho and Johnstone [21] proved that for the 
value T = T U 

T u = aV2 In N, (10) 

where N is the length of the time series and a is the standard deviation of 
the noise, the probability of the wavelet coefficients to be above T u is high. 
This value of T u is known as the universal threshold. Since the value of a is 
unknown, an estimation a for a is used. A good value for a is given by [1] 


M x 

0.6745’ 


( 11 ) 


where Mx denotes the median of the wavelet coefficients at the finest scale, 
that is, {di t k}k since they are the ones related to the highest frequency com¬ 
ponents in the data, and therefore, to the noise. 

To evaluate the performance of a denoising strategy it is considered a loss 
function which evaluates the norm of the difference between the pure signal 
/ and the estimated / from a threshold T, that is 


r{f,T) = E{\\f-f 1| 2 }. 


( 12 ) 


The thresholding risk can be reduced by choosing a value of T lower than the 
one given in (10) and which depends on the data to a greater extent. 
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SURE Estimator 

For thresholding-based denoising the wavelet coefficients above 

the threshold T are smoothed and the ones below T are set to zero. Thus, the 
risk associated to a signal estimation with a threshold T is given by [1] 

r(f, T) = J2 *(*)> £(*) = { X a ^f T ’i l^T- (13) 

4 = 1 t ’ 

Thus, the value of T must be chosen in order to minimize the expression given 
in (13). The applied criterium is based on the SURE ( Stein’s Unbiased Risk ) 
estimator, which is briefly described as follows: 

1. The wavelet coefficients are arranged according to their modulus in de¬ 
creasing order: 

Mi| > \d 2 \ > ... > Mjv|, (14) 

where N is the number of wavelet coefficients and di = dj & for certain j 
and k, and i = 1, 2,..., N. 

2. Given a value T of the threshold, let us £ N : 1 < <xt < N such as 

Mil > ■ • ■ Marl > T > \d aT + 1| > ... > |rfjV|- (15) 

Taken into account the above expression (15), the risk associated to the 
value T of the threshold is given by 

N 

f(/,T)= V \d k \ 2 -(N-a T )a 2 +a T (a 2 + T 2 ) 

kJ^+i " ---' (16) 

""w * (2) 

where the first summing term (1) is the contribution to the risk of the 
coefficients whose amplitudes are below T and the second term (2) is 
related to the coefficients with amplitudes above T. 

3. Thus, the expression (16) must be recalculated for each of the N wavelet 
coefficients and the value of the threshold T will be chosen to be 

T=\d a |, (17) 

for certain a : 1 < a < N in such a way that (16) is minimum. 

4. This algorithm is not always suitable for the filtering of the data. In par¬ 
ticular, if data are too noisy, that is, if the energy of the pure signal / is 
small compared to the energy of the noise W, the energy of the data will 
be too spread among the wavelet coefficients and it can occur that few 
coefficients are above T, so the reconstructed signal is almost zero since 
most of the coefficients are set to zero by the filtering procedure. 

When this situation occurs, the value for the threshold is taken to be the 
universal threshold (10). Therefore, the energy of the signal / must be 
previously compared to a minimum energy level given by [22] 

£N = cr 2 iV 1 / 2 (lniV) 3/2 . 


( 18 ) 
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Due to the energy of / is unknown, an estimation must be used. Since 

Em\\ 2 }=\\f\\ 2 +Na 2 , (19) 

an estimation of ||/|| 2 is given by ||/|| 2 = ||A|| 2 + Na 2 . With this value 
of e^, the threshold will be given by 

/ a\/2\nN, if ||A|| 2 - Na 2 < e N 

I T, if ||X|| 2 -TV ( 7 2 >e i v’ 1 J 

where T is described as in (17). 

Threshold Adapted to Scale 

Since the number of coefficients decreases as long as the scale increases, it 
is useful to adapt the value of the threshold to the scale. With this scale- 
dependence of the threshold, we avoid having a too large value of T for the 
wavelet coefficients corresponding to a large scale, where the number of co¬ 
efficients is lower according to Mallat’s algorithm subsampling, what would 
imply that most of the coefficients would be set to zero. 

The suggested modification calculates a value Tj for each scale s = 2 3 , by 
applying the SURE algorithm previously described onto the wavelet coefficient 
at level j, that is, onto {dyfcjfc, and minimizing the expression given in (13). 

Choice of the Best Wavelet Basis 

Since wavelet denoising is based on the application of a threshold onto the 
wavelet coefficients, and no good estimation is obtained if the energy of the 
data is too spread, the best wavelet basis will be the one which concentrates 
the energy of the data the best, that is, in the minimum number of wavelet 
coefficients. 

Marshall and Olkin proved in [23] that the best basis B = {i/-’ n } n will be the 
one whose coefficients minimize a cost function , 

<2i) 

where 4>(x) is a concave function such as the entropy @(x) = —xlogx or the 
one derived from the norm | j ■ 11 1 . 


5 Application to the GPS Data 

5.1 Detection of Periodicities in the Data 

The existence of certain periodicities affecting the GPS data is well known, 
in particular those related to periods of 1 year, 6 months and approximately 
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400 days. These components appear in GPS data from global and regional 
networks, with no geographic dependence and in the three components of the 
surveyed stations. They are associated to the Earth rotational period and the 
Pole motion [24, 25, 26, 27]. 

Since our data just cover a period of 3 months at the most, the usual long 
period components are not detectable. On the contrary, the 30 min solution 
rate allows the detection of shorter periodicities in the data. 

In order to determine the periodic components affecting our GPS data with 
this sampling rate, the existing gaps and missing data were firstly interpolated 
to get an uniform sample. The best fitting among an interpolation of 1st, 2nd 
or 3rd order was chosen. The number of data considered in the interpolation 
of the gaps depends on the length of the gap according to the following cri- 
terium: for gaps shorter than 6 h, the number of points for the interpolation 
corresponds to the 12 h before and after the gap; for gaps between 6 h and 1 
day, the points of the sample cover one day before and after the gap; finally, 
for gaps greater than 1 day, data corresponding to 3 days before and after the 
gap are considered in the interpolation. 

Once the data were uniformly sampled, the wavelet transform was calculated 
using MatLab6p5 and the WaveLab8.0 packet by Stanford University, with 
modifications on some functions according to our needs. 

The dominant frequency components are identified with the maxima of the 
energy of the wavelet transform along the decomposition scales, that is, with 
the local maxima in the scalegram. 

By way of example, Figs. 7 and 8 show the wavelet spectrum and the wavelet 
scalegram of the East and North component of FUMA station at Fumaroles 
Bay for the 2003-04 Antarctic campaign, respectively. The detected peri¬ 
odicities for FUMA and PEND stations at Fumaroles Bay and Pendulum 



Fig. 7: Wavelet spectrum (a) and scalegram (b) for the East component of FUMA 
station at Fumaroles Bay and 2003-04 Antarctic campaign. Each time unit u in (a) 
represents a 30 min interval. 
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Fig. 8: Wavelet spectrum (a) and scalegram (b) for the North component of FUMA 
station at Fumaroles Bay and 2003-04 Antarctic campaign. Each time unit u in (a) 
represents a 30 minutes interval. 


Cove respectively for the 2003/04, 2005/05 and 2005/06 campaigns are listed 
in Table 2. 

The causes of some of these periodicities are still unknown. The ones cor¬ 
responding to the highest frequencies are associated to the satellite orbital 
period, while medium components are probably due to ionospheric activity, 
which is particularly important in polar regions. In fact, [28] reveals the ex¬ 
istence of some periods of several days which are originated by ionospheric 
effects. Other possible sources causing the medium frequency components are 


Table 2: Detected periodicities for the GPS stations at Fumaroles Bay (FUMA) and 
Pendulum Cove (PEND) for the 2003/04, 2004/05, and 2005/06 Antarctic cam¬ 
paigns from the analysis of the scalegram of the data. 




FUMA 


PEND 



East 

North 

Up 

East 

North 

Up 


12 h 

1 d 

12 h 

12 h 

12 h 

12 h 

2003/04 

8 d 

8 d 

1 d 

4-5 d 

8-12 d 

6-9 d 

8 d 




12 d 





12 h 

12 h 

12 h 

12 h 


12 h 




3-5 d 

1 d 

1-2 d 


2004/05 



14 d 

9 d 

6 d 

4-5 d 


12 h 

12 h 

12 h 

12 h 

12 h 

1 d 


3 d 

9 d 


5 d 



2005/06 

6-7 d 




14 d 



27-28 d 

24-28 d 

25-28 d 


28 d 

24-28 d 
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the ones related to the weather cycle, such as the presence of strong winds or 
the action of some tide effects, typical of marginal seas and inner bays, and 
different from the usual daily and semi daily solar and lunar tides. The study 
of some other kind of measurements (e.g. tide gauges measurements, temper¬ 
ature records) is required for the better determining of the sources causing 
the periodic fluctuation in the data, specially those ones related with medium 
and long periods. 


5.2 Data Denoising 


The filtering approach described in sect. 4.2 was applied onto the GPS data 
for FUMA and PEND stations (Fig. 9). 

Since the results can vary slightly depending on the considered wavelet func¬ 
tion, the denoised signal is estimated as follows: 

1. The wavelet decomposition is calculated with several wavelet bases from 
a set of wavelet functions (wavelet dictionary , [1]). 

2. The four bases for which the cost criterium given in (21) is lowest are 
selected. These are the bases that best concentrate the energy of the data. 

3. The resulting estimated signal is taken to be the average of the denoised 
signal from the four wavelet basis determined in step 1. 

Table 3 resumes the best bases for each station and component, and the 
average dcomp of the differences between the filtered values obtained with 
each wavelet basis for each 30 min interval, and the mean value that has been 
taken as the final solution, that is, 


N B 

E %cdmp 

2=1 


-B 


(z) ^comp(0 

~N 


( 22 ) 



02/12/03 12/12/03 23/12/03 31/12/03 02/12/03 12/12/03 23/12/03 31/12/03 02/12/03 12/12/03 23/12/03 31/12/03 


East North up 

(a) (b) (c) 

Fig. 9: East (a), North (b) and Up (c) component of FUMA station for the GPS 
campaign 2003/04. The solid line denotes the filtered data obtained by applying the 
SURE denoising procedure explained in the text. 
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Table 3: Mean deviation corresponding to the wavelet bases for which the cost 
function reaches the lowest values. B denotes the wavelet bases considered in the 
filtering, and Oj, j = E, N, U (East, North and Up component) is the mean devia¬ 
tion of the raw data with respect to the filtered data for the corresponding wavelet 
bases. 



FUMA 

East 

B 

bior.4.4 

bior5.5 

sym5 

sym9 




(Te 

-0.05 

0.11 

0.02 

-0.08 



North 

B 

sym5 

sym9 

sym7 

db3 




&N 

0.05 

1.2 

-1.7 

2.8 



Up 

B 

coifl 

coif2 

db3 

sym3 

2003-04 



au 

1.9 

3.5 

-2.7 

-2.7 

PEND 

East 

B 

bior5.5 

bior4.4 

db4 

db7 





&E 

0.4 

0.4 

-0.03 

-0.8 



North 

B 

coif3 

coifl 

symlO 

sym8 




&N 

-0.6 

1.1 

-0.2 

-0.2 



Up 

B 

sym5 

sym4 

coifl 

sym6 





-9.8 

-1.4 

8.8 

2.4 


FUMA 

East 

B 

coif5 

coifl 

db7 

coif3 




&E 

-0.4 

0.4 

-0.06 

0.09 



North 

B 

coifl 

sym5 

coif5 

coif4 




on 

-0.5 

2.2 

-0.8 

-0.8 



Up 

B 

coif4 

symlO 

sym8 

sym5 

2004-05 



ou 

-2.9 

-2.8 

0.2 

5.6 

PEND 

East 

B 

coifl 

sym20 

db3 

sym3 





<JE 

0.8 

-0.8 

0.01 

0.01 



North 

B 

sym5 

sym9 

coif4 

symlO 




&N 

-0.8 

-0.2 

0.4 

0.5 



Up 

B 

sym4 

coif2 

coif4 

sym6 




ou 

13.3 

-5.4 

1 

00 

bo 

-0.9 


FUMA 

East 

B 

bior5.5 

bior4.4 

coif5 

rbiol.5 




<JE 

-1 

0.2 

0.03 

-0.05 



North 

B 

coifl 

sym4 

coif5 

coif2 




&N 

-0.5 

-0.0 

0.2 

0.3 



Up 

B 

coif3 

coifl 

coif2 

sym4 

2005-06 



ou 

-3.9 

-2.0 

-4.7 

6.6 

PEND 

East 

B 

bior5.5 

rbio6.8 

sym4 

db6 





<JE 

-0.3 

-0.5 

1.1 

-0.3 



North 

B 

db3 

sym3 

coif2 

coifl 




o N 

-0.1 

-0.1 

0.08 

0.2 



Up 

B 

sym4 

coif2 

coifl 

db3 





-6.7 

5.8 

-1.8 

2.7 


13 ■ 

where N is the sample size, x C omp(i ) is the estimated value obtained with 
the wavelet basis Bj for j = 1,... ,4 and the time interval (or epoch ) i, and 
Xcomp{i ) is the i-epoch mean value given by 
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(23) 

■?'=! 

Bases B given in Table 3 correspond to the following wavelet families: 
Biorthogonal ( bior ), Symmlet ( sym ), Daubechies (daub), Coiflet (coif), and 
Reverse Biorthogonal (rbio). The number next to the wavelet name indicates 
the number of vanishing moments, a wavelet property related to the support 
of the wavelet and to its regularity, in the sense that for a wavelet with p 
vanishing moments the wavelet coefficient for a p-th order polynomial will be 
zero [1, 29]. 

The remarkable point of this method is the great reduction on the deviation 
of the three components (East, North and Up) of the stations. Table 4 in¬ 
cludes the most representative statistical parameters of the raw data and the 
filtered time series. In particular, it can be observed that the greatest reduc¬ 
tion is obtained in the vertical component, what constitutes an important 
achievement since as it was mentioned before the Up component is the less 
reliable one since it is more influenced by the effects affecting the GPS signal. 
In fact, with this denoising procedure the deviation in the Up component is 
reduced down to the level of the horizontal components before the denoising 
procedure. 


Table 4: Some representative statistical parameters (in mm) of the time series before 
and after the denoising. Xj, rj and ay denote the mean, the range and the standard 
deviation of the j component (j = East, North, Up). 


Campaign 

Component 

FUMA 

Xj 

U 


Component 

PEND 

Xj 

U 



2003-04 

East 

O 

0. 2 

44. 2 

6. 4 

East 

O 

-0. 1 

42. 

2 

5. 7 



F 

0. 5 

14. 8 

2. 2 


F 

1. 1 

4. 

3 

0. 8 


North 

O 

-0. 3 

62. 3 

8. 9 

North 

O 

-0. 1 

54. 

9 

7. 9 



F 

1. 0 

5. 3 

0. 9 


F 

0. 9 

10. 

0 

0. 7 


Up 

O 

0. 8 

292. 3 

40. 5 

Up 

O 

0. 6 

248. 

7 

35. 5 



F 

-4. 9 

62. 3 

8. 6 


F 

-9. 5 

39. 

0 

5. 2 

2004-05 

East 

O 

-1. 3 

29. 3 

3. 9 

East 

O 

-1. 6 

55. 

4 

6. 5 



F 

-0. 6 

5. 5 

0. 7 


F 

-0. 6 

6. 

0 

0. 6 


North 

O 

-0. 3 

45. 9 

6. 1 

North 

O 

2. 4 

59. 

2 

7. 7 



F 

0. 8 

10. 6 

1. 1 


F 

2. 2 

3. 

7 

0. 7 


Up 

O 

-4. 2 

201. 9 

25. 3 

Up 

O 

-0. 9 

240. 

2 

35. 8 



F 

-3. 3 

11. 0 

1. 9 


F 

8. 4 

49. 

4 

5. 5 

2005-06 

East 

O 

0 . 1 

28. 1 

3. 8 

East 

O 

0. 2 

44. 

0 

5. 7 



F 

- 0 . 1 

4. 5 

0. 7 


F 

0. 8 

10. 

6 

1. 4 


North 

O 

0 . 0 

40. 4 

5. 4 

North 

O 

0. 2 

46. 

2 

6. 1 



F 

-0. 2 

9. 0 

1 . 0 


F 

0. 8 

8. 

3 

0. 6 


Up 

O 

- 0 . 1 

169. 4 

24. 0 

Up 

O 

-0. 4 

208. 

9 

29. 7 



F 

5. 1 

20. 4 

2. 9 


F 

-9. 1 

20. 

0 

3. 9 
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5.3 Crustal Deformation Models 

The local deformation of Deception Island was estimated from the filtered 
data according to the processing procedure exposed in sect. 3. 

The variation of the station’s coordinates through time provides the global 
deformation of the island, which includes regional and local tectonic effects. 
To isolate the local deformation, the displacements relative to BEJC geodetic 
station at Livingston Island were estimated. In addition, the determination 
of the displacements of the REGID network on Deception Island with respect 
to the reference station BEGC provides an inner perspective of the relative 
deformation on the island. Figures 10-13 include the estimated models, and 
Table 5 resumes the obtained values for the displacement rates. 

The deformation rates estimated from the SURE filtered data were compared 
to the values obtained by means of a common-mode filtering. The common¬ 
mode strategy is widely used with this kind of data [30] and it consists in 
calculating moving averages with a 1-sidereal day length window. We have 
adapted this procedure, calculating a mean value for every 30 min session 
within a day, that is, the filtered values are given by 


Table 5: Tectonic and volcanic displacement rates for FUMA and PEND stations for 
the periods 2003/04-2004/05 and 2004/05-2005/06, obtained by filtering the data 
using common-mode and SURE denoising. 




v, a (mm/yr) 

East 

<JE 

North 


Up 

<ru 


&hor 




Absolute tectonic displacements 



FTTMA 

Corn-mode 

13.0 

7 

9.5 

12 

-71.9 

33 

16.10 

13 

LQ 


SURE Denoising 

13.2 

4 

11.4 

3 

-75.3 

10 

17.4 

5 

O 


Corn-mode 

16.9 

12 

11.0 

18 

-7.52 

40 

20.2 

21 


PEND 










o 

CN 


SURE Denoising 

16.9 

4 

10.7 

3 

-70.0 

24 

20.0 

4 

1 



Volcanic displacements 




o 


Corn-mode 

6.3 

30 

10.8 

18 

-72.5 

29 

12.0 

34 


FUMA 










o 

o 


SURE Denoising 

6.7 

3 

-8.5 

3 

-75.9 

15 

10.0 

4 


PEND 

Com-mode 

10.4 

39 

-9.3 

24 

-74.8 

36 

13.0 

45 



SURE Denoising 

10.5 

4 

-9.3 

4 

-70.8 

24 

14.0 

5 




Absolute tectonic displacements 




Com-mode 

8.2 

7 

17.4 

12 

63.7 

27 

19.2 

13 

zo 

o 


SURE Denoising 

8.2 

4 

15.5 

3 

64.5 

10 

1.5 

5 

LO 


Com-mode 

23.0 

11 

28.2 

17 

71.2 

37 

36.3 

20 

o 

CN 


SURE Denoising 

23.2 

4 

28.5 

4 

68.7 

24 

36.3 

5 

1 



Volcanic displacements 




o 


Com-mode 

3.5 

25 

15.7 

15 

58.9 

24 

16.0 

3 

o 


SURE Denoising 

2 

3 

-5 

3 

64.2 

15 

5.4 

4 

o 

CN 


Com-mode 

18.4 

35 

13.6 

22 

66.2 

33 

22.8 

40 



SURE Denoising 

16.1 

6 

7.8 

4 

68.2 

6 

17.8 

7 
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(24) 


where x SS: d is the estimated solution for the session ss = 1,... ,48 and the 
observed day d 1 and 


D 



(25) 


d =1 

with D being the number of surveyed days. 

Although the displacement rates obtained by both methods do no differ con¬ 
siderably, with differences being in the order of a few milimeters, the remark¬ 
able point of the proposed wavelet methodology is the large reduction in the 
data deviation, specially in the vertical component, whose standard deviation 
goes down to the values of the horizontal component before the filtering of 
the data, as it is shown in Table 5. The calculated displacements agree with 
the pattern detected for the previous period [7, 31]. In fact, during the period 
2001/02 - 2002/03 it was observed a remission of the extensive radial process 
detected after the volcanic crisis in 1998, which continues in the following 
period. The deformation pattern for this period is also included in Fig. 10 in 
order to better interpret the results for the following periods, although GPS 
data from these campaigns were not considered for the wavelet analysis. No 
significant displacement were detected afterwards, although it is observed a 
change in the trend of the surveying stations for the 2003/04 - 2004/05 period, 
aligned according the Hero Fractures Zone. 

6 Conclusions and Outlook 

The strategy presented in this paper for the analysis of GPS time series com¬ 
bines relevant information obtained from a multiscale wavelet-based decom¬ 
position of the data, with a double objective: (a) the detection of periodic 
components in GPS data when they are processed with a 30 min sampling 
rate, and (b) the filtering of the data. 

It can be stated that the detected higher frequency components are related to 
the orbital period of the satellites, while the medium ones seem to be related 
to more local sources such as weather-related effects; the origin of the longer 
period components is not still well determined and further research involving 
the study of other measurements is planned to be done in future Antarctic 
campaigns, such as tide or temperature records, among others. 

On the other hand the denoising strategy has provided very good results, 
reducing the scatter of the data in the three components. The decrease of 
the standard deviation of the data yields a reduction in the errors associated 
to the estimated deformation rates. Particularly remarkable are the results 
obtained for the Up component: in fact, the error corresponding to the vertical 
component is one order of magnitude worse than the error corresponding 
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to the horizontal components before denoising, and it dropps down to the 
magnitude of the horizontal ones after the wavelet filtering. 
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Abstract. This chapter is dedicated to the description and testing of a new 
method of obtaining Probabilistic Activation Maps of seismic activity. This 
method is based upon two major concepts: Cellular Automata (CA) and In¬ 
formation Theory. The proposed method can be used in other fields, as long 
as the spatially extended system is described in terms of a Cellular Automata 
with two available states, +1 and —1, as in the Ising case described here. The 
crucial point is to obtain the rules of an Ising Cellular Automata that maps 
one pattern into its future state by means of an entropic principle. We have 
already applied this technique to the seismicity in two regions: Greece and the 
Iberian Peninsula. In this chapter, we study other regions to test if the ob¬ 
served behavior holds in general. For this purpose, we will discuss the results 
for California, Turkey and Western Canada. The Cellular Automaton rules 
obtained from the correponding catalogs are found to be well described by 
an Ising scheme. When these rules are applied to the most recent pattern, we 
obtain a Probabilistic Activation Map, where the probability of surpassing a 
certain energy (equivalent to a certain magnitude) in the next interval of time 
is represented, which is a useful information for seismic hazard assessment. 

Keywords: Cellular automata, Probabilistic seismic hazard, Information 
theory 


1 Introduction 

When looking for a framework that allows for studying nonlinearity and 
stochasticity at the same time, Information Theory is one of the most natural 
candidates. Information Theory confronts the problem of constructing mod¬ 
els from experimental time series. These models are used to make predictions, 
but the underlying dynamics is unknown (or known but with a high dimen¬ 
sionality and thus incomputable). Information Theory was described (for the 
first time) by Shannon [1, 2] and Shannon and Weaver [3]. This formalism was 
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later used by Shaw [4] to study the time series produced by a drop of water 
that falls from a faucet not properly turned off. He established an alternative 
way to deal with complex problems in the phase space. The behavior and 
evolution of a system in which a series of states is known that occurred at 
times Ti, T 2 ,..., T n , can be characterized by a return map; that is, by repre¬ 
senting the state at T) in the x-axis, and the one at T i+ \ in the y-axis, and 
the process goes on until the adequate dimension is obtained. Shaw used the 
resulting scatter plots and the concept of information based on Shannon’s 
entropy in order to study the knowledge of the future states from the present 
and the past states. This knowledge can be (among others) characterized 
by a quantity called mutual information, which will be described later. In a 
nonlinear, spatially extended system, information can be produced in certain 
regions, and this information can be observed in other regions of the system 
some time later. If localized dynamics at point A result in information gener¬ 
ation, then with finite-accuracy measurements it will take some time for the 
information which is observable in A at time t to be observable in point B; 
the time is, roughly, the diffusive time L 2 / D, where L is the spatial separation 
of A and B 1 and D is a diffusion coefficient. Information transport can be de¬ 
tected by computation of an information-theoretic quantity, the time-delayed 
mutual information, between measurements of the system at separate spatial 
points [5]. Our technique uses this concept in a discrete representation of the 
system, in the form of a Cellular Automaton (CA). The information transport 
is carried out by the rules of said CA. Our goal is to maximize the information 
contained in the neighborhood (A) at a given time about the state in certain 
point B (the central cell) at a later time. 

CA have been proposed as a model for self-organized criticality [6, 7], 
and have been applied as a simple analogue of earthquake occurrence by 
many authors [8, 9, 10]. Those CA are proposed as direct models, based on 
physical hypothesis, and find similar statistics to that found in earthquake 
catalogs. These models correspond to a new approach in seismology model¬ 
ing the earthquake activity. From the first models of fault rupture [11], we 
have moved forward to describing it as a complex system where interactions 
between the faults play a main role. This is also due to the fact that seismol¬ 
ogists have found significant relations between seismicity and critical systems 
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. These findings have lead a new approach 
to this problem based on statistical mechanics foundations. In this framework, 
CA models proposed try to capture this critical behavior in a direct way. With 
our approach, instead, we try to construct a simple CA that best fits the ac¬ 
tual sequence of the earthquake occurrence in a seismic catalog. Our technique 
has already been used in two seismically active regions (Greece and Iberian 
Peninsula) [22, 23], where we found that an Ising CA model is capable to 
describe the major behavior of the considered records in a reasonable way. In 
this contribution, we study earthquake catalogs from three other regions, in 
order to further validate whether Ising CA models are good candidates for 
deriving meaningful information about the dynamics of seismic activity. This 
chapter is organized as follows: In Sect. 2, we describe our method in some 
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detail. In Sect. 3 we present the study regions; Sect. 4 is devoted to our results, 
and finally, in Sect. 5 we expose our conclusions. 


2 Description of the Method 

As we already stated, we use a discrete representation of the seismic region. 
Our dynamical model is a CA which simulates the spatio-temporal evolution 
of the different seismic patterns obtained from the discretization process. The 
interactions are independent of time or static. Yet, seismicity might be have 
non-stationary features, so other types of dynamics are certainly conceivable 
as well. Our approach deals with an Ising-like behavior in the sense that cells 
tend to be in the same state as their neighborhood. The corresponding model 
is fully explained in [23], so here we are pointing out just the principal steps. 
As a cellular automaton, it is represented by its lattice Z d of cells, a finite set 
A of states, a neighborhood set N C Z d , and a local rule / updating the cells. 

The coarse-graining of the events (Fig. 1), both spatially and temporally, 
produces a series of lattices and, after a state is assigned (active, +1, or 
quiescent, —1, in this case), a series of patterns is obtained. The dynamics 
of these patterns is what we want to simulate. With that, we already have 
chosen the lattice Z d (square cells in 2D) and the set of states A. 

The activation criteria are based on the time series given by the quantity: 

JV(r) 

e q (N(r)) =J2 £ n (!) 

n—1 

where e n is the released energy of the n th event, and N(t) is the number of 
earthquakes in a given interval of time r, and where the energy is calculated 
from the relationship between magnitude and energy [24]. If e q exceeds some 
certain threshold in the time interval, the cell is considered active (+1) and, 


Longitude 



Fig. 1: Coarse-graining. 
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otherwise, is quiescent (—1). Note that is the accumulated energy, t \/2 
represents the Benioff strain, e 1( /3 is a proxy for the accumulated radius of the 
rupture, and eo represents the number of events. When the value of q is lower, 
it means that small earthquakes have a higher weight in e q . In general, e q is 
a proxy for the stress accumulation in the cell. 

As we said, in this work we assume an Ising-like framework. This Ising 
framework is introduced by the way the interactions between the cells are 
constructed. The rules for updating a cell (/) must follow those of an Ising 
system. The original Ising model describes the dynamics of a system of coupled 
spins, where the spin at every site is either up (+1) or down (—1). Unless 
otherwise stated, the interaction is between nearest neighbors only and is 
given by — J if the spins (which we will identify with the state of seismic 
activation) are parallel and +J if the spins are anti-parallel. The total energy 
can be expressed in the form 



( 2 ) 


where Si = ±1, J is known as the exchange constant and B is the external 
(magnetic) field. The analog in the seismic case is the external field, that 
corresponds to both plate motion and ductile zone under the cells. In a CA 
representation, the energy is calculated for each site, j being the nearest neigh¬ 
bors, and added. So, we define the state of the cell at a time t by its energy 
with respect to its neighbors, and the state at a later time, is referred to its 
activity. Then, we are interested in the maximum transmission of information 
from the neighborhood to the cell. 

At that point, we already have a series of lattice configurations (patterns). 
In the CA representation, we assume that each cell interacts only with its 
nearest neighbors, and we can calculate the transition rules directly from 
these patterns [25] by means of a histogram of occurrences. 

In an Ising model, the flip transitions are given by the energy state of the 
cells, so that a cell in a state has a certain probability of changing depending 
on the energy of the interactions with its neighborhood and the external field. 
In our method, these probabilities have to be calculated, yielding an inverse 
or data assimilation problem. We assume that we have no a priori hypothesis 
about the nature of the interactions between neighboring sites, nor between 
the sites and an external field. Therefore, we classify the neighborhoods con¬ 
figuration in terms of its “energetic” state, so that each cell has an associated 
energy, Ei, given by Eq. (2), with S., being the central cell’s state and Sj 
the neighboring cells’ states, without an external field (which would represent 
the driving forces, but cannot be calculated), and with the term J set to 1 
without loss of generality. The “energetic state” of a cell with respect to its 
neighborhood is then given by: 



j 


( 3 ) 
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We use a Moore’s neighborhood (TV C Z d ), since it is more isotropic 
[25], and the “energy” can take only discrete values in the interval E £ 
[—8,8]. No Ising behavior is imposed on the transition probabilities, but 
they are extracted from the data itself by calculating the distribution of the 
neighborhood’s states and its influence in the activity or inactivity of the cell 
in the future. 

The transmission of information depends on the number of cells, TV, and 
time interval, f, chosen. By maximizing the time-delayed mutual information, 
pj, between the past and future states we can find the model which contains a 
higher correlation between them [26]. The expression for pj in this particular 
model is as follows: 



with p(i ; j, k) being the joint probability of past and future states, and 
p(i)p(j. k ) a distribution of independent states, i stands for the central cell 
at time t + r, and (j, k ) relates to the central and its k neighborhood’s state 
at time f, with Ei (i £ [0 ,n], i £ TV) representing the possible states. The 
calculated value of pi represents the expected information gain when using a 
model with interacting cells instead of another model where the consecutive 
states are independent [27]. To find the maximum value of /i/, a grid search 
in time steps and number of cells is carried out and, finally, we derive our 
Cellular Automaton. 

After obtaining the transition rules, we can test how well they reproduce 
the data. Simulations of the future patterns are carried out [28], and real and 
simulated patterns are compared by means of the correlation function [29] 
and the Hamming distance [30]. The latter measure gives the number of cells 
that failed in the prediction, representing the simulation error between two 
binary patterns in the usual way. 

Finally, in our application to seismic data, if the CA rules are applied to 
the latest pattern, we obtain what we call a Probabilistic Activation Map, 
with the probability of surpassing certain cumulative e q (equivalent to certain 
magnitude) [25, 31, 23]. Since the model takes into account activity (+1) and 
quiescence (—1), a probability p of becoming active corresponds to a proba¬ 
bility 1 — p of quiescence. To highlight the Ising behavior, we have modified 
this scale from [0,1] to [—1,1]. The maps are slightly smoothed in the corners 
of the cells, because the spatial extension of the cells that maximize the mu¬ 
tual information is too large, so that the display is more understandable. In a 
more general case, it will represent the probability of observing a pattern at 
the next step of the CA. 


3 Data Description and Tectonic Setting 

In this section, we will describe the three sets of data we are using. The Cal¬ 
ifornia and Turkey regions have already been extensively studied [32, 33], so 
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they represent two places where a great amount of information has been gath¬ 
ered, which allows a reliable discussion of our results. The Western Canada 
region is also an important place, where big events can be expected. 

• The Southern California catalog: The catalog is maintained by the South¬ 
ern California Earthquake Center (SCEC) and contains the seismic data 
for the period 1932-2001. The analyzed area ranges from 32-40° N, and 
115-124° W. The magnitude spans from 3.0 to 8.0, where it’s complete. 
The maximum depth is 79 km. The faults in the region are shown in Fig. 2. 

• The Turkey catalog: The catalog is maintained by the Kandilli Observa¬ 
tory in Istanbul, and spans the years from 1900 to 2004. The area ranges 
from 22-46° E and 31-46° N. The magnitude spans the years from 3 to 
7.9. It is complete above magnitude 4.5. The tectonic setting for Turkey 
is shown in Fig. 3. 

• The Western Canada catalog: Obtained from the Canadian National Data 
Centre, data from 1700 to 2004. The area ranges from 120-135° W and 
46.7 55.1° N. Maximum depth is 105 km, the magnitude spans from —0.6 
to 9. The catalog is complete above magnitude 5. The tectonic setting for 
Western Canada is shown in Fig. 4. 



Fig. 2: Sense of slip map for Southern California. Courtesy of John Marquis, created 
using GMT, with a data set cobbled together from several different sources [34, 35, 
36, 37, 38, 39]. 















Describing Seismic Pattern Dynamics by Means of Ising CA 279 



Fig. 3: Tectonic setting in Turkey [40, 41]. 



Fig. 4: Tectonic setting in Canada. 
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4 Results 

The results for California are shown in Table 1 for each catalog and each q and 
magnitude threshold tried for the seismic patterns. The analysis for Turkey 
is in Table 2; and, finally, in Table 3 we show the results for the Western 
Canadian catalog. The maximum mutual information is found with a box size 
of 2 (ss 220 x 220 km 2 ) for all the catalogs, except in the case of California 
and a magnitude threshold of 3, where the mutual information is maximized 
at 1 (« 110 x 110 km 2 ). For the other catalogs we did not find that, since they 
are not complete at this magnitude. When we use a box size of 2, the time 
intervals increase (a lower inter-event time, second row in the tables, where 
the inter-event time is obtained by dividing the difference between the last 
and first time in the data set by the time interval) for the maximization of 
[ii. After that, the number of intervals decreases with increasing magnitude 
threshold. A rough estimation of the corresponding diffusion coefficient D [5] 
gives around 500, 130 and 40 m 2 / s for California and magnitude thresholds 
of 4, 5 and 6, respectively; around 150 and 25 m 2 /s in Turkey for magnitude 

5 and 6; and lower values for Western Canada, around 30, 20 and 10 m 2 /s 
with magnitudes 5, 6, and 7. As [42] pointed out, those values for California 
are consistent with the work by [43], and give a realistic diffusion coefficient 


Table 1: Results of the maximization of the Ising model with an energy threshold 
criterion for California ( q is the value of q in Eq. (1), m is the magnitude threshold, 
t and boxsize are the number of time intervals and the spatial resolution in degrees 
that maximize fii, the mutual information, the error is the Hamming distance in %, 
and M is the averaged ’magnetization’, or number of active cells). 


q m t 

boxsize 

Mr 

error (%) M (%) 

1 3 5 

1 

0.78 

8 

67 

1 4 25 

2 

0.79 

12 

56 

1 5 7 

2 

0.72 

13 

47 

16 2 

2 

0.83 

4 

36 

1 7 2 

2 

0.29 

12 

12 

1/2 3 5 

1 

0.78 

8 

67 

1/2 4 37 

2 

0.83 

10 

55 

1/2 5 6 

2 

0.78 

11 

59 

1/2 6 4 

2 

0.81 

8 

39 

1/2 7 2 

2 

0.69 

4 

20 

1/3 3 5 

1 

0.78 

8 

67 

1/3 4 41 

2 

0.85 

12 

56 

1/3 5 12 

2 

0.86 

7 

56 

1/3 6 5 

2 

0.88 

6 

50 

1/3 7 2 

2 

0.65 

12 

38 

1/3 8 2 

2 

0.24 

12 

12 
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Table 2: Results of the maximization of the Ising model with an energy threshold 
criterion for Turkey. 


q m 

t 

boxsize 

Hi 

error (%) M (%) 

1 5 

9 

2 

0.54 

20 

47 

1 6 

2 

2 

0.52 

13 

45 

1 7 

2 

2 

0.22 

7 

14 

1/2 5 

5 

2 

0.59 

16 

63 

1/2 6 

3 

2 

0.60 

20 

44 

1/2 7 

2 

2 

0.40 

5 

18 

1/3 5 

13 

2 

0.59 

18 

49 

1/3 6 

2 

2 

0.66 

11 

60 

1/3 7 

2 

2 

0.59 

12 

35 

1/3 8 

2 

2 

0.15 

10 

7 


Table 3: Results of the maximization of the Ising model with an energy threshold 
criterion for Western Canada. 


q m t boxsize [ii error (%) M (%) 


1 5 8 

2 

0.28 

9 

17 

1 6 4 

2 

0.25 

17 

16 

1 7 2 

2 

0.12 

16 

8 

1 8 2 

2 

0.08 

3 

3 

1/2 5 8 

2 

0.28 

9 

17 

1/2 6 4 

2 

0.25 

17 

16 

1/2 7 5 

2 

0.10 

7 

6 

1/2 8 2 

2 

0.08 

3 

3 

1/3 5 8 

2 

0.28 

9 

17 

1/3 6 5 

2 

0.28 

11 

15 

1/3 7 2 

2 

0.13 

25 

13 

1/3 8 2 

2 

0.11 

9 

6 


for the region. Turkey seems to have a similar value, but not Western Canada, 
which appears to be slower. 

In general, the mutual information increases with decreasing q as well. The 
difference is higher (more information can be extracted) for high magnitude 
cutoffs in the case of q ^ 0, and is therefore more useful for seismic hazard 
assessment. It is also interesting to note the increase in the “magnetization”, 
approaching 50% of active cells (approaching the null magnetization) with a 
higher mutual information. 

The transition probabilities represent the rules for the CA. Although they 
are stored in tables, they can be better visualized as in Fig. 5. In gray the sit¬ 
uation is represented when the cell is initially inactive, and in black, when the 
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Fig. 5: Rules for the CA. See text for explanation. 


cell is initially active. It always happens that there is a value for the “energy” 
in Eq. (3) when it is more likely to change the cell’s initial state (at times 
an interval oscillates around 50%). That is marked, symbolically, as a vertical 
line in the figure’s bar. The value is not the same for an initially inactive and 
for an initially active cell. It should be noted that this behavior is consistent 
with an Ising-like system, where as the “energy” increases, the probability of 
changing the state is higher. We also point out that this representation can 
describe almost all the situations we tried, in all the regions. Some regions 
do not present enough data for the rules, since the threshold magnitudes are 
too high, and no reliable histogram can be obtained. This is analogous to a 
ferromagnetic material, where the cells tend to adopt the same state as their 
neighborhoods. Depending on the difference between the values for changing 
an initially inactive and an initially active cell, we can interpret that the ex¬ 
ternal field increases (B < 0) or decreases (B > 0) the probability of changing 
the state. For California, there is no clear trend in the sense of the external 
field. If we focus on the results for q = 1/3, and other q with low magnitude 
thresholds, it is positive, so that the plate boundary conditions favor the re¬ 
laxation of the stresses. However, for high magnitudes and q = 1 or 1/2, the 
opposite behavior is found. In Turkey, the general trend is B < 0, so that the 
external field is constantly increasing the probability of changing the state. 
The Western Canada catalog is not clear either, but in this case because of 
the lack of data. 

The fact that for configurations with E < 0 (the surrounding activity state 
is the same as that of the central cell’s activity), the state is always reinforced 
(both active and inactive) is a typical feature of Ising-like behavior. Taking this 
into account, an active region loads its neighborhood when it releases energy. 
This is also a feature contained in the Cellular Automata used to simulate 
the seismicity in the literature [9, 10, 44], that leads to the activation of 
neighboring areas, if they are close to the rupture point. However, the results 
obtained here also point out that an active region becomes quiescent because 
of the neighboring quiescence. This is in accordance with Griffith’s principle, 
in which cells are broken when the release of elastic energy exceeds the surface 
energy cost [45]. If a cell releases energy and the surrounding areas are not 
near the rupture point, they will absorb this energy in an elastic way, without 
becoming active, so that when the initially active cell releases all the exceeding 
energy, it becomes inactive as well. When E > 0, the transition to changing 
the state is less clear, mainly because of the non-uniformity in the stress field. 
In that case, B is different for each cell, as shown before. 
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Fig. 6: Probabilistic Activation Maps for the next intervals of time for California 
with q = 1, m = 6 (35 years), and m = 7 (35 years). 
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Fig. 7: Probabilistic Activation Maps for the next intervals of time for California 
with q = 1/2, m = 6 (17 years), and m = 7 (35 years). 



Figures 6-8 show the Probabilistic Activation Maps obtained for Califor¬ 
nia, Figs. 9-11 represent the results for Turkey, and Figs. 12-14 give the maps 
for Western Canada. 

As it might be expected, the higher probabilities of occurrence lie on the 
principal faults: in particular, San Andreas fault is delineated with <7 = 1/3 
up to magnitude 7. After that, the most energetic spots are near the Big Bend 
and near Garlock fault. This feature is seen in all q , so we would expect the 
bigger earthquakes (around magnitude 7) to occur at those places. In fact, 
two earthquakes of magnitude 6 have occurred near the Big Bend after our 
data set. 

We find two interesting locations in the Turkey catalog: the earthquakes 
occurring in the Aegean Sea and the Karliova junction, where the principal 
faults in the region converge. As it can be seen, the higher probabilities of 
occurrence for the North Anatolian fault lie at both extremes of the mentioned 
fault. That result is coincident with the forecast made in 1996 by Stein et al. 
[33]. They assumed that earthquakes interact, as we also do. However, some 
earthquakes occurred since that time at the mentioned places, so our maps 
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Fig. 8: Probabilistic Activation Maps for the next intervals of time for California 
with q = 1/3, m = 6 (14 years), m = 7 (35 years), and m = 8 (35 years). 
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Fig. 9: Probabilistic Activation Maps for the next intervals of time for Turkey with 
q = 1, m = 6 (52 years), and m = 7 (52 years). 


show a continuation of the activity in those regions. Note also that they are 
hazard maps (in the sense explained before), and not forecasts. Nevertheless, it 
is interesting to see the coincidence in the locations. As before, as q decreases, 
the probabilities increase for the higher magnitudes. Only one earthquake of 
magnitude 6 has occurred after the time covered by our catalog, and it has 
been located in the Aegean Sea. 
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Fig. 10: Probabilistic Activation Maps for the next intervals of time for Turkey with 
q = 1/2, m = 6 (35 years), and m = 7 (52 years). 
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Fig. 11: Probabilistic Activation Maps for the next intervals of time for Turkey with 
q = 1/3, m = 6 (52 years), m = 7 (52 years), and m = 8 (52 years). 


In Western Canada we find that the higher energy releases are expected 
to be related to the Explorer plate, affecting both Vancouver and the Queen 
Charlotte islands. Around 10 earthquakes of magnitude 5 have occurred in 
the sea between these two islands after the data finish. By means of a quick 
calculation, only with a threshold based upon <7 = 1/3 these 10 earthquake of 
magnitude 5 added up surpass one of magnitude 6. With <7=1 and q = 1/2, 
that is not true. The locations are also coincident with the white spot in 
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Fig. 12: Probabilistic Activation Maps for the next intervals of time for Western 
Canada with q = 1, m = 6 (76 years), and m = 7 (152 years). 
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Fig. 13: Probabilistic Activation Maps for the next intervals of time for Western 
Canada with q = 1/2, m = 6 (76 years), and m = 7 (61 years). 
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Fig. 14: Probabilistic Activation Maps for the next intervals of time for Western 
Canada with q = 1/3, m = 6 (61 years), and m = 7 (152 years). 


Fig. 14 just under the Queen Charlotte Islands, and that location does not 
appear with high probability for the other q used. However, since the time 
intervals involved are long term, we can not decide which map is best, or 
more helpful for different purposes. Note that the errors change slightly with 
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the chosen values for q. However, the Hamming distance, as said before, is the 
usual way to compare two binary patterns. 

In all cases, the places where these extreme events are foreseen are con¬ 
sistent with the tectonic setting. Other hazard maps for these regions show 
the probability of exceedance in 50 years are delineating the principal faults. 
The important point of our method is the idea that a particular time since 
the last data needs to be checked. The time-dependence is important in order 
to obtain the transition probabilities. If we average the energy, we should find 
the temporal and spatial scales for the averaging. If we use the same as that 
found in the model, the patterns change, so it would not be time-independent. 
In a time-independent estimation, the energy (or magnitude) used for the cal¬ 
culations is usually the highest found in each seismogenic zone. That may 
change in time, depending on the time span of the catalog. So, if we compare 
the patterns with the highest magnitudes, or add up the energy, we find that 
they are time-dependent. Note that since the time span is long and the spatial 
resolution is low, it can not be viewed as a prediction, but as a forecasting, 
or a Probabilistic Activation Map, but it is not static. It is also noted that 
there are no attenuation laws and no site responses applied, so the meaning of 
these Probabilistic Activation Maps are the probabilities of surpassing certain 
energies (magnitudes) in the different studied areas. So they are understood 
as seismic hazard maps, rather than an actual forecast. 


5 Conclusions 

The method proposed in [23] has been used in three catalogs, and the results 
obtained have shown to be consistent with previous studies. The seismic activ¬ 
ity patterns can be translated to an Ising Cellular Automaton model to obtain 
estimates of the probabilities of surpassing certain energies (magnitudes) at 
each cell of said CA. The Probabilistic Activation Maps obtained may be 
useful for time-dependent seismic hazard assessment, although the transition 
probabilities are static. This methodology is also applicable to other fields, as 
long as the system is described in terms of a CA, that is, a coarse-graining 
is carried out, in time, space and state, to describe its main features in a 
discrete fashion, e.g. in ecological evolution. The evolution of the obtained 
patterns are given by the rules of the CA that maximize the delayed-mutual 
information. 

The idea from previous results that a lower number of cells usually maxi¬ 
mizes the mutual information is reinforced. It reflects the large scale nature of 
earthquake occurrence. This, joint with the different periods of time for each 
magnitude threshold, can be roughly explained as a diffusion of the stresses, 
which are transmitting the information to the whole system. 

Our Probabilistic Activation Maps for California mark the Big Bend and 
Garlock faults as the places where higher magnitude releases might be ex¬ 
pected. In Turkey, the higher seismic hazard corresponds to the Aegean Sea 
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and the Karliova junction, both extremes of the North Anatolian fault. Finally, 
in Western Canada, we expect the higher seismic activity around Vancouver 
and the Queen Charlotte Islands, related to the Explorer plate. All these 
results are consistent with the different tectonic settings. 

Finally, we conclude that evolving CA models, although simplified, cap¬ 
ture some interesting behavior of the seismic interactions. More work will be 
done to see if this methodology gives us more insight about regional seismicity 
and its relationship to statistical mechanics. In particular, we are interested 
improving the resolution in both space and time, and to see if the fractal na¬ 
ture of seismicity can be used for this purpose. Other possible generalizations 
of this work are to add up more states for the cells, and to try a Potts model, 
instead of an Ising one. Finally, the error analysis should be improved, in the 
sense that should be more sensible to the changes in the parameters, and how 
different thresholds in the probability maps affect the results. 
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Abstract. Self-exciting dynamos are nonlinear electro-mechanical engineering de¬ 
vices, or naturally-occurring magnetohydrodynamic fluid systems that convert me¬ 
chanical energy into magnetic energy without the help of permanent magnets. Hide 
et al. [1] introduced a nonlinear system of three coupled ordinary differential equa¬ 
tions to model a self-exciting Faraday disk homopolar dynamo. Since only a small 
selection of possible behaviours, including two examples of chaotic behaviour, was 
investigated by them, Moroz [2] performed a more extensive analysis of the dynamo 
model, including producing bifurcation transition diagrams and generating unsta¬ 
ble periodic orbits for the two chaotic examples. We now extend that analysis and 
use ideas from topology [3] and results from a corresponding analysis of the Lorenz 
attractor to identify a possible template for the HSA dynamo. 


Keywords: Dynamos, Unstable periodic orbits, Templates 


1 Introduction 

In 1996 Hide, Acheson and Skeldon [1] (hereafter denoted by HSA) introduced 
a nonlinear model for a self-exciting Faraday disk dynamo as a simple analogue 
for the heat storage capacity in the oceans, thought to be a key factor in 
the dynamical processes underlying the El Nino Southern Oscillation. Self¬ 
exciting Faraday disk dynamos, such as the HSA dynamo, are of interest since 
they contain some of the key ingredients of large-scale naturally occurring 
magnetohydrodynamic dynamos, while being of considerably lower dimension 
and therefore more amenable to systematic study. 

Since their original paper, there have been many extensions to the original 
three-mode dynamo ([4, 5, 6, 7, 8, 9, 10, 11] to include such effects as the cou¬ 
pling two or more dynamos together, an azimuthal eddy current, an external 
magnetic field or a battery, etc. Many of the low order models of this family, 
have rich ranges of behaviour with irregular reversals a common feature, as 
well as steady, periodic and coexisting states (due to hysteresis effects). What 
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has been lacking has been a means of distinguishing between these and other 
models as a prelude to comparing them with the large-scale counterparts. One 
possible way is via their spectra of unstable periodic orbits (upos), as well as 
noting their behaviours when key parameters in the problem vary. 

Topological methods have been developed to analyse three-dimensional 
dissipative dynamical systems in the chaotic regime [3]. Such methods sup¬ 
plement the more conventional approaches such as the calculation of Lyapunov 
exponents and dimension calculations, although we have also included such 
calculations in this chapter. The topological approach proceeds by identifying 
the expansion and contraction mechanisms which are involved in creating the 
strange attractor. This leads to a branched manifold, also called a template 
or knot holder, on which the upos are organised in a unique way. Certain 
topological invariants are computed from a chaotic time series. These invari¬ 
ants, usually determined from the lowest order upos, are the (a) Gauss linking 
numbers, (b) relative rotation rates and (c) templates themselves. One is then 
able to determine whether two dynamical systems are equivalent, whether a 
model accurately represents a physical system etc. 

In a recent chapter, Moroz [2] returned to the original HSA study, which 
only investigated a very small selection of possible parameter values, and pro¬ 
duced bifurcation transition diagrams for the two examples of chaotic dynamo 
behaviour reported by HSA. In addition first return maps were used to obtain 
unstable periodic orbits for these and other examples, but no attempt was 
made to identify the branched manifold for the underlying attractor. We rec¬ 
tify this now by reporting the linking number calculations that were used to 
identify the template, using numerical algorithms developed by Bob Gilmore. 

The chapter is organised as follows. In Sect. 2 we review the derivation and 
the salient features of the HSA dynamo, including the linear stability analysis. 
In Sect. 3 we review how Koga [12] constructed the Poincare section for his 
calculation of upos in the Lorenz equations. Section 4 summarises certain 
results from [2] and introduces two new cases to be studied here. Section 5 
explains the template analysis, following [3] and Sect. 6 uses these ideas, and 
numerical algorithms for the computation of Gauss integrals, provided by Bob 
Gilmore, to compute tables of linking numbers for both the Lorenz and the 
HSA equations. We summarise our results in Sect. 7. 


2 The Hide, Skeldon, Acheson Dynamo 

We begin by introducing the Hide, Skeldon, Acheson (HSA) dynamo, following 
the treatment given in [1]. 

2.1 The HSA Equations 

The HSA dynamo is a system of three coupled nonlinear ordinary differential 
equations for an electrically conducting Faraday disk, connected in series with 
a coil and a motor via sliding contacts attached to the axle and the rim of the 
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disk. The disk is driven into rotation with angular speed fi(r) by a steady 
applied couple G. In the presence of a magnetic field, an e.m.f. is produced in 
the rotating disk, and a current I(t) flows through the coil and motor. 

Applying torque balance to the motor and to the disk respectively gives 
the following two nonlinear equations: 


Bu = HI — Du, (la) 

AI2 = G — MI 2 - KI2. (lb) 

Here A is the moment of inertia of the disk, K its coefficient of mechanical 
friction; B is the moment of inertia of the armature of the motor, D its coef¬ 
ficient of mechanical friction; HI is the torque on the armature, produced by 
the current; Du is the torque due to mechanical friction in the motor; K12 is 
the mechanical friction in the disk; M/2ir is the mutual inductance between 
the coil and the rim of the disk, and the dot denotes differentiation with re¬ 
spect to t . The final equation, for J, comes from identifying the e.m.f.s from 
the various components of the dynamo and applying Kirchoff’s Voltage Law. 
The e.m.f. generated by the moving disk MID is balanced by the voltages 
RI , LI and Hu to give: 

LI = MW -RI- Hu, (2) 

where L is the self-inductance of the system and R is the series resistance. 
Introducing dimensionless variables 

R (M\ 1/2 T M RBM 1 ? 2 

,= l t ’ i= (gJ y= R n ’ z = Yhg^ u 

Equations (1) and (2) become 

x = x(y- 1) - (3z, 
y = a(l- x 2 ) - ny, 
z = x — A z, 

where the dot now denotes differentiation with respect to t, and 

GLM H 2 L KL DL 

a ~ J&A' P-RMS' K ~RA ’ X ~RB' 

The four positive parameters appearing in (5) can be interpreted as follows, 
a is a measure of the applied couple, k is a measure of the mechanical friction 
in the disk, (3 is a measure of the inverse moment of inertia in the armature 
of the motor and A its mechanical friction. 

2.2 Linear Stability Analysis 

To translate the trivial fixed point to the origin we introduce y(t) = a — y(t), 
so that (4) becomes 


( 3 ) 


(4a) 

(4b) 

(4c) 


( 5 ) 
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(6a) 

(6b) 

(6c) 


x = (a — l)x — xy — PXz, 
y = k( ax 2 - y), 
z = x — X z, 

where (3 = (3/ A and a = a/n. This gives 


a = 


GM 
~RK ’ 


P 


H 2 

RD' 


( 7 ) 


While a is still a measure of the applied couple, G, (3 no longer measures the 
inverse moment of inertia of the armature. 

The system (6) possesses three equilibrium solutions: 


xo = (a; 0 ,yo,^o) = (0,0,0), (8a) 

x e = ( x ei y e ,z e ) = (x e ,axl,x e / A), (8b) 

where x e = ±[1 — (1 + (3) /a-] 1 / 2 . 

The trivial equilibrium x 0 undergoes a pitchfork bifurcation when 

a s = 1 + P, (9) 


and a supercritical Hopf bifurcation on the line 

cxh = 1 + A. 


( 10 ) 


The nontrivial equilibria x e undergo subcritical Hopf bifurcations on 


_ 1 . 3 -s | ^[ 2 /5 — (k + A)] 
2(ac — P) ' 


( 11 ) 


provided 


a + A/2 > 3/3/2 + 1 


and P ^ k. 

All three equilibria undergo a codimension-two double-zero bifurcation at 
the point 

03,5) = (A, 1 +A). (12) 


2.3 Parameter Regimes 

HSA presented time series and phase portraits for two isolated examples of 
chaotic behaviour: 


(a, p, k, A) = (20.0,2.0,1.0,1.2), (13a) 

(a, p, k, A) = (100.0,1.01,1.0,1.0), (13b) 


the latter having many more oscillations around the non-trivial equilibria 
before reversals than the former. 
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This was rectified in [2] where bifurcation transition curves were computed 
for these two choices of (a, k, A) as functions of [3 for the whole range in which 
chaotic behaviour was to be found to be present (see [2], Figs. 2 and 4). In 
addition bifurcation transition curves were also presented for other choices of 
k and A. 

As mentioned in [2], the choice of k = A is degenerate, since (11) and 
(12) above show that it corresponds to the double-zero bifurcation for the 
trivial and nontrivial equilibria, coinciding with the vertical asymptote for the 
subcritical Hopf bifurcation off the nontrivial equilibria. We therefore follow 
[2] and perturb away from this degeneracy by selecting the same values for A 
and k as for the a = 20 problem. 

Our linking number calculations and template analysis will therefore test 
the dependence on a (a nondimensional measure of the applied couple G), as 
well as on (3. 


3 The Lorenz Equations 


Because of their relevance to the present investigation, we find it convenient 
to discuss certain aspects of the Lorenz equations [13]. The Lorenz equations 


x = a(y- x), 

(14a) 

y = rx — y — xz, 

(14b) 

z = —bz + xy , 

(14c) 

have three equilibrium solutions 


(x 0 ,y 0 ,Zo) = (0,0,0), ( x e ,y e ,z e ) = (±y/b(r- 1 ),±y/b(r 

— l),r —1). (15) 


A linear stability analysis shows that, while all three equilibria can un¬ 
dergo steady state bifurcations, only (x e ,y e ,z e ) undergo subcritical Hopf bi¬ 
furcations and there are no double-zero bifurcations, unlike the HSA system. 
Koga [12] introduced the change of variable 

z = z—{r — 1), 

which translates the z nontrivial fixed point to the origin and transforms 
(14) to 


x = a{y — x), (16a) 

y = -y + x(l-z), (16b) 

5 = —b(z + r — 1) + xy, (16c) 

and the equilibria to 

(* 0 i2/0) Zo) = (0,0,1 — r), (. x e ,y e ,z e ) = {±\Jb{r - 1), ±\A( r - 1),0). (17) 
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He takes the Poincare section to be 

S = [(a:, y) : z = 0, z > 0, x > 0], (18) 

and computes upos on this section, presenting a selection of the orbits, to¬ 
gether with a table of their periods for the choice (r, a, b) = (28,10,8/3). 


4 Numerical Investigations 

Moroz [2] presented various bifurcation transition curves for a = 20 and a = 
100. The two examples of chaotic dynamics focused on represented examples 
chosen from the middle of the chaotic regime (a = 20) and from near the 
loss of stability of chaotic behaviour to steady dynamo action ((a = 100). A 
selection of unstable periodic orbits were presented but no attempt was made 
to identify the template of the underlying attractor nor to see how it was 
affected by different parameter choices. We rectify that here. 

We present results for three instances of the a = 20 and one of the a = 
100 cases. In the former, we choose values of (3 from both extremes of the 
bifurcation transition diagram, representing the loss of stability to steady 
states (low values of f3 ) and loss of stability to a stable periodic state (high 
values of /3). In addition, we return to the two main examples of [2]. Our goals 
in this chapter are to determine the effects of varying /3 and of varying a 
on the linking number calculations and the template identification, albeit for 
a very small selection of parameter values. A complete study is beyond the 
scope of the present work. 

Since the linking numbers and template analysis for the Lorenz attractor 
will prove key to our analysis, we include pertitent results for the classic choice 
of (r, a, b) = (28,10, 8/3) when appropriate. 

Following Koga [12] we translate the variable y(t) by its nontrivial equi- 


librium state, Y(t) = 1 + f3 — y(t ) so that (6) becomes: 


x = (a — 1 — ax 2 )x — xY — f3Xz, 

(19a) 

Y = n(ax 2 —Y — ax 2 ), 

(19b) 

z = x — A 2 . 

(19c) 

and take the Poincare section to be 


5 = {(x,z) :Y = 0,Y> 0,x> 0}. 

(20) 


4.1 The Four Examples 

We begin by presenting the (x, Y) phase portraits and the x(t) time series 
for the four examples of interest. In all of our integrations we fixed A = 1.2 
and k= 1. Figure 1 shows the HSA attractor for a = 20 and (3 = 1.25, close 
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Fig. 1: The phase portrait in the (*, T)-plane and the time series of x(t) for a = 20, 
/3 = 1.25, A = 1.2 and k = 1.0. 


to the loss of stability to steady dynamo behaviour. Figures 2 and 3 show 
the corresponding plots for (3 = 2 (in the middle of the chaotic regime) and 
P = 2.6 (close to the loss of chaotic to stable periodic behaviour) respectively. 
What is evident from these three cases is that the number of oscillations 
about each of the nontrivial fixed points decreases as /3 increases. Our fourth 
example, shown in Fig. 4, is for a = 100 and /3 — 2 and again represents an 
example, close to the loss of stability to steady states (see [2]). 

We computed the Lyapunov exponents and the Lyapunov dimension for 
these four cases and compared them with those for the Lorenz equations for 
their classic parameter choices. We used the Kaplan-Yorke estimate for the 
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Fig. 2: The phase portrait in the (a;, T)-plane and the time series of x(t) for a. = 20, 
P = 2.0, A = 1.2 and k = 1.0. 
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Fig. 4: The phase portrait in the (x, Y)-plane and the time series of x(t) for a = 100, 
P = 2.0, A = 1.2 and k = 1.0. 


Table 1: Lyapunov exponents and Lyapunov dimension for the four examples of the 
HSA dynamo considered here. 


a 

P 

Lyapuno 

v Exponents 

Lyapunov Dimension 

20 

1.25 

0.4333, 

0 , 

-1.0066 

2.4305 

20 

2.0 

0.2837, 

0 , 

-0.9298 

2.3051 

20 

2.6 

0.1850, 

0 , 

-0.8639 

2.2142 

100 

2.0 

0.4885, 

0 , 

-0.8922 

2.5475 
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Lyapunov dimension Dl , where Dl = 2 + Ai/| A 3 1. The results for the HSA 
system are shown in Table 1. The magnitudes of the Lyapunov exponents and 
Lyapunov dimension decrease as (3 increases. For the Lorenz equations with 
(r , cr, b) = (28,10,8/3) we obtained (0.9076,0, —14.574) with our codes, giving 
Dl = 2.0623. The Lorenz attractor is therefore more strongly contracting than 
the HSA attractor. 

4.2 Unstable Periodic Orbits 

Upos were found as close returns on the Poincare section as follows. The HSA 
equations were integrated for 60, 000s with a time step of 0.001s. A close return 
was determined from the condition 

11 Yi — Yy || = y/( Xi -Xj)2+ C Yi - Ytf + (Zi - < e, 

where e = 0.005. Moreover the method of Henon [14] was used to guarantee 
that all trajectories landed precisely on the Poincare section. 

Guided by the signs of the nontrivial equilibria, we adopted the protocol 
of labelling trajectories using symbol sequences R m if the trajectory cycled m 
times around the equilibrium x e > 0, and L n if the trajectory cycled n times 
around the negative equilibrium x e < 0. Each symbol L or R was taken to 
correspond to one period. Thus, for example, R m L n would be a period-(m+n) 
orbit, which cycles m times around x e > 0 and n times around x e < 0 . 

Figure 5 shows a comparison of the histograms of upos for three of the 
cases for the HSA dynamo with that for the Lorenz equations obtained on 
their respective Poincare sections. It is clear from these histograms that the 
numbers of distinct upos decreases as (3 increases, and therefore that, for 
a given choice of parameter values, the HSA dynamo does not contain all 
possible upos of a given period (unlike the Lorenz equations). This becomes 
evident in the template analysis below. 

Moroz [2] shows a selection of upos for the cases of (a, (3) = (20,2.0) and 
(a, 13) = (100,2.0). Here we show a selection of upos for the remaining two 
cases of (a, (3) = (20,1.25) and (a, (3) — (20,2.6). 

Figure 6 shows examples of the two lowest period upos found for the HSA 
dynamo when a = 20 and (3 = 1.25. No examples of the period-2 LR upo 
was found for this case. Indeed Fig. 1 shows that one would expect upos of 
higher period to occur. Recall that (3 = 1.25 is near to the loss of stability of 
the oscillatory behaviour to steady dynamo action. 

When f3 = 2.6, the LR period-2 orbit, with period 2.444s predominates 
(see Fig. 5d). Instead of showing any of the lower order upos, we shall illustrate 
this case with two different period -10 orbits, which were also used to verify 
the HSA template. Figure 7 shows examples of a RL(R 2 L 2 ) 2 upo with period 
12.078s and an ( LR) 3 L 2 R 2 upo with period 12.378s. 

We can also compare upos found for different values of a. Figure 8 shows 
the L 3 R 3 period -6 orbit for a = 20 (upper two panels) compared to that for 
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(a) (b) 




(c) 



period 


(d) 



Fig. 5: The histograms of upos for (a) the Lorenz equations for (r, a, b) = (28.10, 8/3), 
and the HSA dynamo for a = 20 and (b) /3 = 1.25, (c) j3 = 2.0 and (d) /3 = 2.6. 


a = 100 (lower two panels). The former has period 7.353s, while the latter 
has period 3.523s. 


5 Template Analysis 

For the Lorenz equations at their classic parameter values, examples of upos 
of all low order periods were found using the method of close returns on the 
Poincare section. For the HSA dynamo this is not the case for both of the 
examples of Fig. 9 in [1], For this reason we calculated the linking numbers 
for upos of higher periods, in order to verify the validity of our template 
choice. 

The linking number L(A, B) of two upos A and B is defined to be half 
the number of signed crossings of A and B. It can be computed in two ways: 
either from a Gauss integral 


(x A 


- x B ).(dx A A dx B ) 

||x A - X B || 3 


L(A,B) 


1 

4-7T 


A 


B 


( 21 ) 
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Fig. 6: The (x,Y) phase portraits and x(t) time series of a period-3 L 2 R (upper 
pair) and a period-4 L 2 R 2 upo (lower pair) for a = 20 and /3 = 1.25. The L 2 R orbit 
has a period of 4.207s, while the L 2 R 2 orbit has a period of 5.439s. 


or by projecting the two upos onto a two-dimensional plane and counting 
half of the number of signed crossings by eye [3]. In our calculations we used 
both approaches, but primarily a numerical code very kindly supplied by Bob 
Gilmore to compute the Gauss linking numbers via the integral in (21). 

To verify the template and compute the self-linking numbers, we used 
a second numerical algorithm also supplied by Bob Gilmore. This requires 
information about the twisting and crossing of branches of the template and 
is given by the torsion matrix T(i,j). Here T(i,j) is the signed crossings of 
the *th and j th branches. If T(i,j) = 0, the branches do not cross. For the 
Lorenz system, there are two branches (corresponding to the cycling of the 
upos around each of two nontrivial equilibrium states) and neither twists, so 
that T(i,j) = 0 for i,j = 1,2. The right hand branch of the template lies in 
front of the left hand branch, so that the layering information is (1,-1). The 
HSA system also has two nontrivial equilibria and so two branches, and orbits 
for the LSA and HSA attractors are describable in terms of two symbols. The 
algorithm also requires a listing of the upos in terms of period and symbol 
sequence. The output is a table of linking numbers (including self-linking 
numbers, whose values cannot be calculated by the Gauss integral (21)). 








304 


I.M. Moroz 






Fig. 7: The (x,Y) phase portraits and x(t) time series of two period-10 upos for 
f3 = 2.6. The upper pair shows a RL(R 2 L 2 ) 2 upo with period 12.078s, while the 
lower pair shows an (LR) 3 L 2 R 2 upo with period 12.378s. 


6 Identification of the HSA Template 

To identify a possible template for the HSA equations, we proceeded as follows. 
Using the Lorenz equations as a test bed, we calculated linking numbers for 
all 23 upos in the equations up to, and including, orbits of period 6, together 
with two period-8 orbits, using the Gauss linking number code. While Gilmore 
and Letellier [15] have produced tables of linking numbers for the Lorenz 
equations for all orbits up to period-5, to the best of our knowledge, this 
is the first time a table for orbits up to period-6 has been computed and 
presented in the published literature. Because 23 upos are involved, we have 
found it convenient to include a simplified labelling of the orbits in Table 2, 
with the linking numbers displayed in Table 3. 

The structure of the Lorenz template is already known and the template 
verification code, referred to in the previous section, was used to compute 
the full table of linking numbers for all the selected upos, including values 
for the self-linking numbers. This is shown in Table 3. Linking numbers for 
distinct upos are found as the off-diagonal entries, while self-linking numbers 
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Fig. 8 : The (x, Y) phase portraits and x(t) time series of two period -6 L 3 R 3 upos 
for p = 2.0. The upper pair is for a = 20 and has period 7.353s, while the lower 
pair is for a = 100 with period 3.523s. 


Table 2: Upos of period- 6 , and two period -8 upos for the Lorenz equations when 
(r, a, 6 ) *(28,10,8/3). 


Period 

2 

3 

3 

4 

4 

4 

Label 

2 

3i 

32 

4i 

42 

43 

Periodic Orbit 

LR 

L 2 R 

R 2 L 

L 3 R 

LR 3 

L' 2 R 2 

Period 

5 

5 

5 

5 

5 

5 

Label 

5i 

52 

5 3 

54 

5s 

5 6 

Periodic Orbit 

L 4 f? 

Li ? 4 

L 3 R 2 

L' 2 R 3 

LRL Z R 

RLR 2 L 

Period 

6 

6 

6 

6 

6 

6 

Label 

6 i 

62 

63 

64 

65 

6 e 

Periodic Orbit 

L b R 

LR b 

L*R 2 

L"i ? 4 

L 3 R 3 

L 2 R 2 LR 

Period 

6 

6 

6 

8 

8 


Label 

67 

6 s 

69 

81 

82 


Periodic Orbit 

~R r L 2r RL 

L 3 RLR 

R 3 LRL 

L 2 R 2 (LR) 2 

R 2 L 2 (RL) 2 
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Table 3: Linking numbers for all 
Lorenz equations when (r, a, b) = 
diagonal. 


upos of period- 6 , and two period -8 upos for the 
(28,10,8/3). Self-linking numbers are along the 


irbit 

2 3i 3 2 4i 4 2 4 3 5i 

52 5 3 54 55 56 61 

62 6 3 64 65 66 67 68 69 81 

82 

2 

1 

1 

1 

1 

1 

2 

1 

1 

2 

2 

2 

2 

1 

1 

2 

2 

2 

3 

3 

2 

2 

4 

4 

3i 

1 

2 

1 

2 

1 

2 

2 

1 

3 

2 

3 

2 

2 

1 

3 

2 

3 

3 

3 

4 

2 

4 

4 

3 2 

1 

1 

2 

1 

2 

2 

1 

2 

2 

3 

2 

3 

1 

2 

2 

3 

3 

3 

3 

2 

4 

4 

4 

4i 

1 

2 

1 

3 

1 

2 

3 

1 

3 

2 

3 

2 

3 

1 

4 

2 

3 

3 

3 

4 

2 

4 

4 

42 

1 

1 

2 

1 

3 

2 

1 

3 

2 

3 

2 

3 

1 

3 

2 

4 

3 

3 

3 

2 

4 

4 

4 

43 

2 

2 

2 

2 

2 

3 

2 

2 

3 

3 

4 

4 

2 

2 

3 

3 

4 

5 

5 

4 

4 

7 

7 

5i 

1 

2 

1 

3 

1 

2 

4 

1 

3 

2 

3 

2 

4 

1 

4 

2 

3 

3 

3 

4 

2 

4 

4 

52 

1 

1 

2 

1 

3 

2 

1 

4 

2 

3 

2 

3 

1 

4 

2 

4 

3 

3 

3 

2 

4 

4 

4 

5 3 

2 

3 

2 

3 

2 

3 

3 

2 

4 

3 

5 

4 

3 

2 

4 

3 

4 

5 

5 

5 

4 

7 

7 

54 

2 

2 

3 

2 

3 

3 

2 

3 

3 

4 

4 

5 

2 

3 

3 

4 

4 

5 

5 

4 

5 

7 

7 

5s 

2 

3 

2 

3 

2 

4 

3 

2 

5 

4 

6 

4 

3 

2 

5 

4 

5 

6 

6 

6 

4 

8 

8 

5e 

2 

2 

3 

2 

3 

4 

2 

3 

4 

5 

4 

6 

2 

3 

4 

5 

5 

6 

6 

4 

6 

8 

8 

6 i 

1 

2 

1 

3 

1 

2 

4 

1 

3 

2 

3 

2 

5 

1 

4 

2 

3 

3 

3 

4 

2 

4 

4 

62 

1 

1 

2 

1 

3 

2 

1 

4 

2 

3 

2 

3 

1 

5 

2 

4 

3 

3 

3 

2 

4 

4 

4 

63 

2 

3 

2 

4 

2 

3 

4 

2 

4 

3 

5 

4 

4 

2 

5 

3 

4 

5 

5 

6 

4 

7 

7 

64 

2 

2 

3 

2 

4 

3 

2 

4 

3 

4 

4 

5 

2 

4 

3 

5 

4 

5 

5 

4 

6 

7 

7 

65 

2 

3 

3 

3 

3 

4 

3 

3 

4 

4 

5 

5 

3 

3 

4 

4 

5 

6 

6 

5 

5 

8 

8 

6 e 

3 

3 

3 

3 

3 

5 

3 

3 

5 

5 

6 

6 

3 

3 

5 

5 

6 

7 

8 

6 

6 

10 

11 

67 

3 

3 

3 

3 

3 

5 

3 

3 

5 

5 

6 

6 

3 

3 

5 

5 

6 

8 

7 

6 

6 

11 

10 

6 s 

2 

4 

2 

4 

2 

4 

4 

2 

5 

4 

6 

4 

4 

2 

6 

4 

5 

6 

6 

7 

4 

8 

8 

69 

2 

2 

4 

2 

4 

4 

2 

4 

4 

5 

4 

6 

2 

4 

4 

6 

5 

6 

6 

4 

7 

8 

8 

81 

4 

4 

4 

4 

4 

7 

4 

4 

7 

7 

8 

8 

4 

4 

7 

7 

8 

10 

11 

8 

8 

13 

15 

82 

4 

4 

4 

4 

4 

7 

4 

4 

7 

7 

8 

8 

4 

4 

7 

7 

8 

11 

10 

8 

8 

15 

13 


of each upo are shown in bold type along the diagonal. Note the rcflectional 
symmetry about this main diagonal. 

We then used the Gauss linking number code to compute linking numbers 
for upos extracted from the HSA dynamo for the cases of a = 20 and a = 100 
under consideration in this study. 

Tables 4-6 show the linking numbers for the upos we used for a = 20 
and (3 = 1.25, ft = 2 and /3 = 2.6 respectively, while Table 7 shows those for 
a = 100 and /3 = 2. The self-linking numbers are again along the diagonal. 

As is evident from the histograms in Fig. 5, unlike the Lorenz system, not 
all low period upos were found in the HSA system in our time series for the 
parameter values we considered. We were therefore compelled to use much 
higher order orbits to verify the template than is usual. That using such high 
order upos produced no discrepancies in the template verification code, only 
served to underline the validity of our calculations and guess. 

In each of the four examples considered in this chapter, the linking numbers 
were found to be compatible with the template for the Lorenz equations, and 
so the same layering of the two torsion-free branches. 
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Table 4: Linking numbers for some of the upos found in the HSA dynamo for a = 20, 
j3 — 1.25, A = 1.2, k = 1. Self-linking numbers are along the diagonal. 


Period 
Period Orbits 

3 3 

L 2 R LR 2 

4 

L 2 R 2 

5 

LR 4 

5 

6 

3 i? 3 

7 

L 4 R 3 

7 

L 3 R 4 

3 

L 2 R 

2 

1 

2 

1 

2 

3 

3 

3 

3 

LR 2 

1 

2 

2 

2 

1 

3 

3 

3 

4 

L 2 R 2 

2 

2 

3 

2 

2 

4 

4 

4 

5 

LR 4 

1 

2 

2 

4 

1 

3 

3 

4 

5 

L 4 R 

2 

1 

2 

1 

4 

3 

4 

3 

6 

l 3 r 3 

3 

3 

4 

3 

3 

5 

5 

5 

7 

l 4 r 3 

3 

3 

4 

3 

4 

5 

6 

5 

7 

l 3 r 4 

3 

3 

4 

4 

3 

5 

5 

6 


Table 5: Linking numbers for some of the upos found in the HSA dynamo for a = 20, 
(3 = 2, A = 1.2, k = 1. Self-linking numbers are along the diagonal. 


Period 

Period 

Orbits 

2 4 

Li? i? 2 L 2 

6 

R 3 L 3 

6 

RLR 2 L 2 

6 

LRL 2 R 2 

8 

(LR) 2 L 2 R 2 

3 

^ 00 

$0 

to 

to 

2 

LR 

1 

2 

2 

3 

3 

4 

4 

4 

R 2 L 2 

2 

3 

4 

5 

5 

7 

7 

6 

R 3 L 3 

2 

4 

5 

6 

6 

8 

8 

6 

rlr 2 l 2 

3 

5 

6 

7 

8 

11 

10 

6 

lrl 2 r 2 

3 

5 

6 

8 

7 

10 

11 

8 

(LR) 2 L 2 R 2 

4 

7 

8 

11 

10 

13 

15 

8 

(R.L) 2 R 2 L 2 

4 

7 

8 

10 

11 

15 

13 


Table 6: Linking numbers for some of the upos found in the HSA dynamo for a = 

0 = 

2.6, A = 1.2, 

K, = 1 

Self-linking numbers 

are along the diagonal 



Period 

2 

4 

6 

6 

8 

8 

10 

Period Orbits 

LR L 

2 R 2 

L 2 RLR 2 

R 2 LRL 2 

L 2 R 2 (LR) 2 

R 2 L 2 (RL) 2 

RL(R 2 L 2 ) 2 

2 

LR 

1 

2 

3 

3 

4 

4 

5 

4 

L 2 R 2 

2 

3 

5 

5 

7 

7 

8 

6 

l 2 rlr 2 

3 

5 

7 

8 

11 

10 

12 

6 

r 2 lrl 2 

3 

5 

8 

7 

10 

11 

13 

8 

L 2 R 2 (LR) 2 

4 

7 

11 

10 

13 

15 

18 

8 

R 2 L 2 (RL) 2 

4 

7 

10 

11 

15 

13 

17 

10 

RL(R 2 L 2 ) 2 

5 

8 

12 

13 

18 

17 

19 
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Table 7: Linking numbers for some of the upos found in the HSA dynamo for a = 100, 
0 = 2, A = 1.2, k = 1. Self-linking numbers are along the diagonal. 


Period 

Period 

Orbits 

2 6 
LR R 3 L 3 

6 

RLR 2 L 2 

6 

LRL 2 R 2 

7 

L 4 R 3 

7 

L 3 R 4 

8 

L 4 R 4 

8 

(. LR) 2 L 2 R 2 

2 

LR 

1 

2 

3 

3 

2 

2 

2 

4 

6 

R 3 L 3 

2 

5 

6 

6 

5 

5 

6 

8 

6 

rlr 2 l 2 

3 

6 

7 

8 

6 

6 

6 

11 

6 

lrl 2 r 2 

3 

6 

8 

7 

6 

6 

6 

10 

7 

l 4 r 3 

2 

5 

6 

6 

6 

5 

6 

8 

7 

r 4 l 3 

2 

5 

6 

6 

5 

6 

6 

8 

8 

l 4 r 4 

2 

6 

6 

6 

6 

6 

7 

8 

8 

(LR) 2 L 2 R 2 

4 

8 

11 

10 

8 

8 

8 

13 


7 Discussion 

In this chapter we have completed the study, begun in [2] into the classification 
of chaos in the HSA dynamo using unstable periodic orbits. Moroz [2] selected 
two examples of chaotic behaviour and extracted upos from time series for two 
values of a, a dimensionless parameter which measures the couple applied to 
the disk of the dynamo, driving it into rotation. 

Bifurcation transition curves produced in [2] indicated that for a = 20, the 
chaotic behaviour occurred for a value of 0, mid-way between the transition 
to steady states (low 0) and the loss of stability to stable periodic motion 
(high 0). For a = 100, the chosen example was very close to the transition to 
steady states. No attempt was made by [2] to identify the underlying attractor 
using topological methods. 

Moreover, as well as considering two different values of a, in this chap¬ 
ter, we have also investigated the effects of varying f3, choosing two cases 
which lie close to either the transition to steady or to periodic states. The 
histograms shown in Fig. 5 indicate that as 0 increases, the number of dis¬ 
tinct upos decreases. Also, for the parameter values investigated, the Lorenz 
system contains more distinct upos than does the HSA system. 

In the present study, we have applied topological ideas, expounded by 
Gilmore [3], to compute linking numbers for the upos and, thereby, we have 
identified a possible template on which the orbits lie. This template turns out 
to be topologically identical to that for the Lorenz equations. 

As well as for the HSA system, we have also performed similar calcula¬ 
tions of upos and linking numbers for another dynamo model, the Extended 
Malkus-Robbins dynamo [10, 11], a four-dimensional nonlinear system which 
reduces to the Lorenz equations when 0 = 0. For certain choices of parameter 
values, the Extended Malkus-Robbins equations are strongly contracting, with 
a Lyapunov dimension Dl < 3. Since the system becomes effectively three- 
dimensional, the topological methods described here are applicable. Moroz et 
al. [16] were able to show that the branched manifolds describing the projected 
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attractors for small and large values of f3 were reflectionally-symmetric, of 
Lorenz type with rotation symmetry. 

The HSA dynamo is the first of a class of low order dynamo models to 
have its chaotic behaviour classified using topological methods [7]. Other such 
models include effects such as an external magnetic field, a battery term, the 
coupling of two or more dynamo units together, an azimuthal eddy current 
([5, 6, 8]). For such and other systems whose Lyapunov dimension Dl < 3, the 
topological approach presented here has the potential to classify the underly¬ 
ing chaotic attractor, and to distinguish between different chaotic attractors. 
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Abstract. Methods to detect solitons and determine their parameters are consid¬ 
ered. The first simple observational test for soliton identification is based on the 
determination of statistical relationships between amplitude, duration, and carrying 
frequency of the detected signals, and their comparison with the relevant relation¬ 
ships from the soliton theory. The second method is based on the solution of the 
direct scattering problem for the relevant nonlinear equations. As an example the 
Derivative Nonlinear Schrodinger (DNLS) equation has been considered. The inte¬ 
gral reflection coefficient, which should rapidly drop when a signal is close to the 
A-soliton profile, has been used as a soliton detector. Application of this technique 
to numerically simulated signals shows that it is more efficient than the standard 
Fourier transform and can be used as a practical tool for the analysis of outputs 
from nonlinear systems. 

Keywords: Nonlinear MHD waves, Nonlinear, Fourier analysis, Derivative 
nonlinear Schrodinger equation 


1 Introduction: Solitons in Geophysical Media 

Nonlinear waves and solitons are frequently observed in all geophysical me¬ 
dia: the solar corona [1], interplanetary space [2], Earth’s magnetosphere [3, 4], 
topside ionosphere [5], atmosphere [6, 7], and Earth’s crust [8]. In a nonlinear 
medium a disturbance with finite amplitude commonly evolves to the soliton 
state [9, 10]. The modern theory predicts and has mathematical tools to de¬ 
scribe TV-soliton structures and soliton turbulence gas [11, 12]. The detection 
of the soliton component and determination of its properties demands elab¬ 
oration of special nonlinear methods of signal analysis. Standard methods of 
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spectral analysis based on the Fourier Transform (FT) fit well the detection 
of linear waves and determination of their properties, but they are not very 
effective for the examination of highly structured space plasma turbulence. 

The simplest approach is based on the determination of the statistical re¬ 
lationships between amplitudes, duration, velocity, etc. of the observed signal 
ensemble. Then, the comparison with the theoretically predicted relationships 
for a given soliton class may be used as a simple observational test for its 
identification [3]. In this paper, we provide the necessary basic relationships 
for some types of solitons, and indicate the validity and limitations of these 
relationships. 

However, the above simple statistical method of soliton identification re¬ 
quires an analysis of substantial number of signals desirably under the same 
external conditions. Thus, the approach that can be applied to a case study 
is highly desirable. 

The idea of “nonlinear Fourier analysis”of observational time series, based 
on the numerical solution of the direct scattering transform (ST) associated 
with Korteveg-de Vries equation, was suggested and implemented by Osborne 
et al. [13, 14]. This approach was applied to the analysis of ocean surface 
waves. Later on Hada et al. [15] suggested to apply the ST associated with 
the Derivative Nonlinear Schrodinger equation to a complex time series. 

The approach of [15] is further developed in this paper. We have built 
an effective numerical algorithm to implement the ST. Below we give a short 
description of this algorithm, comprising calculations of discrete data of the 
scattering problem (otherwise, soliton parameters), and apply this technique 
to numerically simulated signals. 


2 Statistical Method for Soliton Detection 

A linear wave packet in a dispersive medium decays upon propagation due 
to the packet spreading. If dissipation is weak, the energy conservation law 
predicts the following relationship between the observed amplitude A and 
duration T as follows A 2 T ~ const. However, in any realistic geophysical 
system the amplitude of generated wave packets may vary in a wide range, 
and this relationship has no practical sense. Contrary to linear signals, the 
soliton amplitude A is not a free parameter, but it is intrinsically related to 
other signal parameters, such as duration T, nonlinear component of velocity 
V, carrying frequency w, etc. The statistical relationships between them may 
be used as a simple observational test for soliton identification [3]. 

To apply adequately this method, the basic relationships, as well as their 
limitations, are to be taken into account. From many evolutionary equations 
we consider the following two main model equations. 

2.1 Nonlinear Schrodinger Equation (NLS) 

The NLS equation 


id t i> + A d xx ip = v\i>\ 2 tp 


(1) 
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is quite universal. It is commonly used to describe weakly nonlinear and dis¬ 
persive wave packets. The medium nonlinearity produces higher harmonics 
which results in the slow (on the carrying wave scale) spatial-temporal change 
of the packet envelope. 

In the case of modulation instability, according to the Lighthill condition 
Xv < 0, the wave breaks with time into isolated wave packets. The envelopes 
of these packets are the solitons of the Eq. (1) (e.g., [16]): 

x /> = a\J—2X/v sech [a(x — 2bt)\ exp [ibx + iX(a 2 — b 2 )t] . 

This expression shows that the soliton amplitude A = ay/—2X/v and its 
spatial scale L = a~ x are not independent variables, as for linear waves, 
but they are coupled by the relationship ( AL) 2 = —2Xv~ x . However, usually 
the observed parameter is not the spatial scale L, but the soliton duration 
T = L/V, where V is the soliton velocity relative to the registration site. 
Thus 

(AT) 2 = —2Xv~ x V~ 2 . (2) 

The relationship (2) can be used as a criterion for soliton identification. When 
the background medium is motionless, the velocity V equals the group veloc¬ 
ity, V ~ V g = 2b. In the case of fast medium flow relative to a detector with 
the velocity U which much exceeds V g (e.g., solar wind), one should suppose 
that V ~ U. 

The right-hand part of Eq. (2) depends in general case on the carrying 
wave frequency u>. However, the dependence (2) of the product AT on u> for 
various waves, though described by the same NLS equation, is not universal, 
but depends on the medium property, namely, on the wave dispersion law 
and nonlinearity coefficient. For example, for the quasi-longitudinal magne¬ 
tohydrodynamic wave propagation in a plasma AT oc w^ 1 / 2 , whereas for the 
quasi-perpendicular fast magnetosonic wave propagation AT oc w _1 [3]. Fur¬ 
ther, for waves at the surface of deep water [16] AT = gto~ 3 , where g is the 
gravitational acceleration. 


2.2 Derivative Nonlinear Schrodinger Equation 

The DNLS equation 


d t 'b + i5d x > x 'b+ ad x '(b\b\ 2 ) = 0 (3) 

is not so universal as the NLS equation. It was used foremost for the descrip¬ 
tion of weakly nonlinear dispersive Alfven waves in the case of quasiparal¬ 
lel propagation [17]. Moreover, the DNLS equation was applied to describe 
Alfven solitary structures in magnetized dusty plasmas [18]. The elliptically 
polarized field of the Alfven wave can be presented in the complex form as 
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b(x, t ) = b y +ib z . The coefficients of the Eq. (3) for the dispersive Alfven wave 
are as follows 


4(1 - p) ’ 2 S2i 


(4) 


where Va is the Alfven velocity, /3 = V 2 /Vj, V s is the sound velocity, fli is 
the ion gyrofrequency. The parameter 8 is related to the ion inertia dispersion 
length: 28/Va = Va/^i- 

The variables x' and t' are related to the original physical variables, coordi¬ 
nate x and time t, in the laboratory coordinate system, by the 
relationships 

x' = e 2 (x — VAt), t' = eH, (5) 


where e = A = max|B|/B 0 is the small amplitude of the magnetic field 
disturbance. This change of variables is used upon derivation of Eq. (3) by 
the perturbation technique [19]. 

In order to avoid additional formula complications we reduce the DNLS 
equation to the following normalized form 


d T i/j + d^(tl>\t/j\ 2 ) + id^ip = 0. 


( 6 ) 


This reduction uses the following change of scales: 

r = 2f2iti = 2n t Vl x x\ ^=\{l -P)~ 1,2 b. (7) 

We consider the case of rapid decrease of solution of the Eq. (6) at |x| —> oo. 
One-soliton solution of (6) for this case has the form [15] 

*l) so i(t/,T-,\)=a(X-,\)e i \ (8) 


where 


«(A;A) 2 = 


8 A 2 


|A| cosh(4AiX) — A, 

9 = — 2A r £ — 4 (A 2 — A 2 ) r + 9, 9{X\ A) = 3 arctan 


X = £- 4A r r, 

A i tarili (2A,; A) 


|A| — A r 


(9) 


This solution depends on the complex parameter A = A r + (A, >0). It is 

determined to within an initial soliton location at the initial moment, and the 
initial phase. 

One may see from the above formulas that the structure of DNLS solitons 
is more complicated than the structure of NLS solitons. A feature of the DNLS 
solitons is the nonuniform variation of phase: 6 is not a linear function of either 
x , or t. Hence, the frequency and wavenumber of the soliton packet may be 
determined only locally as u> = —dO/dt and k = 86/dx. 

The physical parameters of the DNLS solitons and the relationships be¬ 
tween them can be found by reverting to the physical variables x and t with 
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the help of the coordinate transforms (5) and (7). As a result, the following 
expressions for the variables in ( 8 ) and (9) are obtained: 


X(x,t) = ^l e 2 [x _ Va (l + 4e 2 A r ) t] 

v A 


( 10 ) 


e(x,t) = - 4 ^e 2 


x - V A 1 + 2 e 


A 2 - At 


.2 -V 


A*' 


+ 0(X ; A), (11) 


where e is the soliton amplitude. The magnetic field of the soliton is expressed 
via ip S oi as follows 


B so i(x,t ) = eb so i(x,t ) = 2eR 0V /l - /3ip so i(x,t; A). (12) 

Here ip so i has been normalized by the condition 

4(1 — 0) max \ip S oi\ 2 = 1, be. 32(1 - /?) (|A| + A r ) = 1. (13) 

From (10) and (11) the group velocity of the soliton packet follows 

V = V A (1 + 4e 2 A r ) . (14) 

The spatial scale and time duration are 

L = V A (%e 2 f! l \i)- 1 , T = L/V A = {Se 2 Q i \ i )- 1 . (15) 


The local frequency is determined by 


w = - d t 9 = at oo + 2e 2 f2id x d = Woo 


12e 2 f2ih 

1 + (1 + h 2 X~ 2 ) sinh 2 (2AiX) ’ 


(16) 


where h = [32(1 — /3)] _1 , and Woo = —4er 2 AThe maximum of local fre¬ 
quency w max = Woo + 12 e 2 f2ih > 0 is reached in the center of the soliton 
X = 0. In the above formulas for L, T, and ui the small correction terms of 
higher order in amplitude have been omitted for breivity. 

The specific for DNLS solitons nonuniform phase variation means that the 
local frequency within the soliton packet grows from the limiting value to 
the maximum w mQX and returns back to lOoq. Under A r > 0 the limit ujoo and 
maximum oj max have opposite signs, that is the phase changes nonmonotoni- 
cally. 

It is necessary to mention that the physical values V, L , T, and 0Jm ax 

are determined by two independent real parameters only, because owing to 
the normalization (13) the real and imaginary parts of the parameter A are 
coupled as follows: A 2 = h 2 — 2h\ r . 

The DNLS solitons can be visualized as a wave packet envelope of some 
high-frequency signal only in the limiting case A^ —> oo. Really, in this case 
Wmax ~ Wqo, because A^ ~ ( 2h ) A^ * OO, SO (w m ax Woc)Woo — 
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—3/iA r 1 —> 0. The frequency (16) practically does not depend on x or t, 
u> = 4e 2 l?j|A r |. We find from (15) 

(eT ) 2 = (64e 2 fl?A?) _1 = (128/ie 2 /2?|A r |) _1 = (32 fr^w)” 1 , 
i.e. the relationship between e = A, T, and ui: 

(AT) 2 = (1-P)(f2 i u>)~ 1 . 

The above relationship in the quasi-monochromatic limit uT —> oo is simple 
enough for practical use. However, the above simple method of soliton iden¬ 
tification requires an analysis of a statistically significant number of signals 
under the same external conditions. An alternative method described below 
can be applied to a single event. 


3 Integral Reflection Coefficient as a Soliton Detector 


We demonstrate the proposed method using as an example DNLS solitons. For 
consistency with [ 20 ] we transfer from variables in the normalized equation 
( 6 ) as follows: ^(£,r) —► b(x,t). The exact solution of the DNLS equation ( 6 ) 
may be reduced to the solution of a linear problem with a well elaborated 
algorithm. This algorithm is based on the solution of the direct and inverse 
scattering problems for the auxiliary linear system [ 20 ]: 


d x v\ = —i\vi + \J~Xbv2 , 
d x v2 = iXv^ + y/Xb*vi , 


(17) 


where A is the spectral parameter. If the function b{x, t ) evolves in time accord¬ 
ing to the DNLS equation (6), then the functions v\, V 2 satisfy also another 
linear system: 

dtvi = Avx + Bv2 , / 1s n 

d t V2 = Cv 1 - Av2 , 

where B = —y/X (2A b+b*b 2 + id x b ), C = —\/\ (2A6* + ( b*) 2 b - id x b*) : and 
A = i (2A 2 + A6*6). In other words, the compatibility condition for these linear 
systems for all values of A is just the DNLS equation. 

Any solution of the system (17) can be decomposed over either of two func¬ 
tion bases ip, Cp or i/j, if) (Jost functions). These function bases are characterized 
by their asymptotic behavior at x —> — oo 





exp(— iXx), 





exp(iAx), 


(19) 


or at x 


+00 





exp(iAx), 





exp(— iXx). 


( 20 ) 
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The scattering coefficients si(A) and s 2 (A) are determined by the decomposi¬ 
tion (similar to the combination of incident and reflected waves) 


ip = Slip + S2lp ■ (21) 

The ratio r(A) = s 2 /si for real A is the reflection coefficient. Zeros of the func¬ 
tion Si (A) in the upper half-plane of the complex variable A are the discrete 
eigenvalues of the spectral problem (17); for such A = \ n in view of (21) we 
have 

ip(x; A n ) = s 2 (A n )ift(x; A n ). 

From above it follows, with account for (19) and (20), that function </>(x; A n ) 
decays exponentially at both sides, which in quantum mechanics corresponds 
to a “bound state 

The function of real argument r(A), the eigenvalues A n , and the normaliza¬ 
tion coefficients c n = iAn 1 ^ 2 s 2 (A n )/s' 1 (A n ) constitute the scattering dataset. 
If b(x,t ) from (17) evolves in accordance with ( 6 ), then the scattering data 
owing to (18) vary in a simple way as: 

r(A, t) = r( A, 0) exp(4iA 2 f), A„(f) = A n (0), c n (t) = c„(0) exp(4*A 2 f). 

This allows to find b(x,t) by solving first the direct scattering problem for the 
given initial condition 6 (x, 0 ), and, second, the inverse scattering problem for 
any t. 

The observations provide the complex function b(x) as a time series 
Xo, x\, ..., Xm in M + 1 points. The onset and end of the examined inter¬ 
val, xq and Xm are to be chosen in such a way to make \b(x)\ sufficiently 
small, e.g., less than 10 -4 . For the numerical solution of the direct scattering 
problem it is necessary to find the coefficients Si(A) and s 2 (A) for real A, and 
to find all roots of the equation si(A) = 0 in the upper half-plane. In accord 
with the definition of the coefficients ( 21 ), the initial condition for the differen¬ 
tial equation system (17) at x = Xq is to be taken as ^ 1 (^ 0 ; A) = exp(— iAxo), 
and ip 2 (x o;A) = 0 (see (19) ). Then, the Eq. (17) are numerically integrated 
from Xq to xm- After that, si(A) and s 2 (A) can be found with the use of (20) 
and ( 21 ): 

Si (A) = pi(x M ; A) exp(iAx M ), s 2 (A) = v 2 {x M \ A) exp(-iAx M ) ■ 

Finally, for each real A the reflection coefficient r(A) = s 2 (A)/si(A) is calcu¬ 
lated. 

The method of calculation relies upon the fact that the discrete eigenvalues 
are zeros of the function si(A) of complex spectral parameter. The following 
two-stage algorithm for the effective numerical calculations has been elabo¬ 
rated. At the first stage, the values Resi(A) and Imsi(A) on the real axis 
are to be calculated and the points where these values are equal to zero are 
to be found. Then, starting from the points obtained, the contours of zero 
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level Re Si (A) = 0 and Imsi(A) = 0 are calculated in the upper half-plane. 
Calculations continue until those contours intersect, as illustrated in Fig. 4. 
As a detector of DNLS solitons the integral reflection coefficient R is used, 


that drastically decreases (theoretically to zero) when a signal under analysis 


is close to the iV-soliton profile bN(x,t). This integral reflection coefficient is 
introduced with the formula 


OO 



Any iV-soliton spatial profile is a reflectionless potential and for such a profile 
this value R is exactly zero, whereas it is positive for any other distribution 
b(x). Any changes in the spatial scale (that is changes of sampling step <5 
over x in a numerical values of the profile under study) result in a change 
of the quantity R. For an exact ,/V-soliton profile these variations of scale 
yield a sharp minimum of the function R(S) for a correct step 5. Thus, the 
integral reflection coefficient is very sensitive to the change of the linear scale. 
Therefore, if the analysis of an experimentally detected spatial profile with the 
use of the scale variation method gives a dependence R(S) with an evident 
minimum, this may imply an occurrence of a substantial soliton component in 
this disturbance. As a by-product of this method, a correct value of the step 
<5 = <5 S is determined. Using the determined value of 6 one can calculate the 
discrete data of the scattering problem and retrieve a pure soliton part of the 
disturbance under study, with the help of known formulas for the N- soliton 
solution. 

Let us consider the situation when the complex function b(x) = b y {x) + 
ib z (x) is measured in a fixed observation site x^°\ whereas a wave disturbance 
b(x — Vt) propagates along it with unknown velocity V. It is supposed that 
soliton has been formed outside the observational region and does not undergo 
nonlinear deviations upon propagation through this region. Thus, we have in 
the point x^ the time series b^°\tj) = b{x — Vtj), j = 0,1,..., M , with 
time sampling rate tj+\ — tj = At. Under unknown velocity V the spatial 
structure b(x) can be determined only disregarding the spatial step S = VAt. 
Therefore, the method in lieu of time series (tj) analyzes the spatial profile 
b(x'S ), determined in the points Xj = Xq + jS , with unknown step <5. The 
calculated integral reflection coefficient I? as a function of S tends to zero 
under certain S = S s , if the wave b(x — Vt) is the exact iV-soliton profile. 
Then, the step 5 S and wave velocity can be found as V = 5 S /At. 

The example of the R{5) dependence, calculated for a given one-soliton 
profile corresponding to the eigenvalue A = 0.3 + 0.5i is shown in Fig. 1 
(curve A). This plot shows that at S = S s the function R(6 S ) indeed has a 
sharp minimum (near zero), whereas to both sides away from this point the 
function R(S) rapidly grows. 

When the complex eigenvalue A = A r + i\ has been determined the phys¬ 
ical parameters of searched soliton such as velocity V, characteristic length 
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Fig. 1: Integrated reflection coefficient R(S) for the exact one-soliton profile (8) with 
parameter A = 0.3+0.5* (A) and its imitation with the localized wave packet bi rn it{x ) 
(22) (B). 


L , duration T, and local frequency can be found with the help of explicit 
formulas (14), (15) and (16). 

4 Discrimination of solitons from “linear” Wave Packets 

Here we examine how well the proposed method of time series analysis can 
discriminate between actual soliton and a similar localized wave packet. As 
a test we use the soliton b so i(x ; A) with eigenvalue A = 0.3 + 0.5*. The result 
of calculation (curve A in Fig. 1) shows that the dependence R(S) has an 
evident minimum, reaching zero under the same sampling step 5 = S s as the 
raw function b so i(x\ A) was determined. In due course, the linear wave packet 
may be described by the imitating function of the following form 

bimit{x) = A exp [-q(x - x 0 ) 2 ] [cos(0 - 0 C ) + *sin(0 - 0 S )], 6 = k(x - x 0 ). 

( 22 ) 

This function depends on many parameters, which enabled us to choose a 
function bi m i t (x) with nearly the same waveform as the exemplary soliton 
b so i(x; A). The values of the parameters used for calculation are: A = 3.0, 
7 = 1.5, k = 2.4, Xq = — 0.27, 9\ = 1.04, 02 = 0.54. The waveforms of 
“soliton” b so i(x) and “linear” b imit (x) functions are shown in Fig. 2. 
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Fig. 2: Comparison of soliton b so i(x) and “linear” signal bimit(x), which have been 
used for the calculation of integral reflection coefficient R(S) in Fig. 1. 


The calculation of the integral reflection coefficient R(S) for the function 
bimit(x) has provided the following result (Fig. 1). The plot R(5) for the 
soliton-imitating signal (22) (curve B) is essentially different from the corre¬ 
sponding plot for actual soliton (curve A). Instead of a minimum at <5 = <5 S , 
the coefficient R(S) has at <5 near S s a plateau at rather high level. Thus, 
the proposed function R(S) for characterizing the soliton-like nature of the 
signal has turned out to be very sensitive to a signal deviation from a soliton 
waveform. 


5 Influence of High-Frequency Noise 
on the Soliton Detector 

To validate and show the robustness of the proposed technique to the possible 
occurrence of high-frequency noise in data, this method has been applied to 
the testing signal, consisting of a soliton b so i(x) with parameter A = 0.3 + 0.5i, 
and high-frequency interference signal b per t(x) = aexp(2inx) with amplitude 
a = 0.3 (which provide noise/signal ratio ~ 11 %): 

b{x) — b so l{x) T &peri(^)- (23) 

The estimated reflection coefficient R(S) for this “noisy” soliton is shown in 
Fig. 3 (curve B). The comparison with pure soliton (curve A) shows that 
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Fig. 3: Comparison of the reflection coefficient R(5) for the pure soliton (A) and the 
soliton with high-frequency harmonic noise (B). 



Fig. 4: The example of the eigenvalue A finding for one-soliton profile, perturbed by 
high-frequency harmonic interference. The value A for the exact one-soliton profile 
is marked by cross. 
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despite the noise occurrence the coefficient R still has an evident minimum in 
the vicinity of 8 = 8 S . 

The calculation of eigenvalues A for a “noisy”soliton (23) has been made for 
the sampling scale S s with the use of the algorithm described above. Figure 4 
shows the zero-level lines Resi(A) = 0 and Imsi(A) = 0; their intersection 
is the searched complex eigenvalue A. For a relatively weak noise amplitude 
a = 0.3 a perturbed eigenvalue does not shift far from a nominal eigenvalue 
for pure soliton b so i{x) (marked by a cross). 

The sensitivity of the method to the noise amplitude a is demonstrated 
by the results of the analysis of several “noisy”soliton profiles (23) (Fig. 5). 
The calculations of R(S) and A show that under even a relatively high level of 
interference signal a = 0.45, when the R(S) minimum near S s is rather unclear, 
the eigenvalue A nevertheless does not shift too far from “actual” unperturbed 
value A = 0.3+0.5*. Thus, the proposed method is sufficiently robust regarding 
high-frequency noise. 




Fig. 5: The dependence of the results of the perturbed soliton profile (23) analysis on 
noise amplitude a (indicated near relevant curves) and the results of the eigenvalue 
A calculation for the same noise levels (asterisks at bottom plot). The value A for 
the exact one-soliton profile is marked by cross. 
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6 Possible Applications and Further Studies 

The method of integral reflection coefficient can be applied, strictly speaking, 
to any soliton solutions of the integrable nonlinear wave equations. The local¬ 
ized solutions of more general non-integrable nonlinear wave equations, which 
may be called “soliton-like”or “coherent structures”, cannot be directly found 
by this technique. However, analysis of the observational data with ST can 
provide information on how strongly viscosity and other complicating factors 
distort in reality the soliton structure of non-linear waves. 

We have elaborated this technique for DNLS because this equation de¬ 
scribes a wide range of nonlinear phenomena in space plasma. The feature of 
the DNLS soliton is the consistent variations of both amplitude and phase, 
so the identification of this type of solitons by only its amplitude wave form 
(e.g., by least square approximation) is not sufficient. 

The suggested method is not limited by the treatment of one-soliton profile 
b(x), but may be applied to the analysis of more complicated events with 
several interacting solitons. Figures 6 and 7 show examples for the N = 3 case. 
The function b so i(x), shown in Fig. 6, corresponds to the interaction of three 




X 


Fig. 6 : The function b 30 i(x), describing the interaction of 3 near solitons with pa¬ 
rameters Ai = 0.4 + 0.3i, A 2 = 0.5i, and A 3 = —0.4 + 0.4? (upper panel); comparison 
of the noise with amplitude a = 0.1 and modulus of the function b so i{x) (bottom 
panel). 
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Fig. 7: Calculation of the soliton component for the 3-soliton profile with high- 
frequency interference, shown in Fig.6. The noise amplitude is a = 0.1. 


near-by solitons with parameters Ai = 0.4 + 0.3i, A 2 = 0.5i, and A 3 = —0.4 + 
0.4i. The bottom panel shows the interference signal with amplitude a = 0.1 
and absolute value of function b so i(x). Figure 7 presents the calculation results 
of the soliton parameters. One can see, that even in the case of complicated 
profile composed from several interacting solitons and high-frequency noise 
the soliton parameters can be reliably retrieved. 

The DNLS solitons are balanced nonlinear objects which are robust in re¬ 
spect to disturbances and noises. Therefore the retrieval of soliton component 
from a signal has a fundamental importance. At the same time, the component 
of nonlinear wave corresponding to the continuous spectrum (“radiation”) is 
apt to dispersion and hardly distinguishable from noise. Therefore, we have 
not tried to retrieve the radiation component. 


7 Conclusion 

When the observational conditions enable one to acquire a statistically signif¬ 
icant number of relationships between basic parameters (amplitude, duration, 
etc.) of the expected soliton signals the simple test for soliton identification, 
based on their comparison with the relevant relationships from soliton theory, 
can be applied. For a case study, the method to detect soliton and determine 
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its parameters, based on the scattering transform, may be recommended. We 
have constructed the algorithm of numerical solution of the direct scattering 
problem for the linear system (17). This system is associated in the inverse 
scattering techniques with the DNLS equation (6). 

The integral reflection coefficient which steeply drops (theoretically to 
zero) when a signal is close to the JV-soliton waveform has been used as a 
detector of the DNLS solitons. Application of this technique to numerically 
simulated signals showed that it is more efficient than standard FT and can 
be developed into a practical tool for the analysis of outputs from nonlinear 
systems. The application of the proposed technique to modeling signals shows 
its superiority over the standard FT. The technique effectively discriminates 
iV-soliton solution (N = 1 — 5) from non-soliton isolated disturbances (Gaus¬ 
sian packets). Examples of the Fourier spectra of solitons given in [15] show 
that standard spectral analysis practically does not evidence the occurrence 
of the soliton structure in time series. However, a wave envelope that seems 
complicated to the FT may be a superposition of just a few solitons, easily 
retrievable with the proposed method. 

This approach seems promising for the analysis of nonlinear signals in 
space physics, often detected in the solar wind, magnetosheath, auroral region, 
etc. This method enables one to determine from single-point observations 
the basic parameters of soliton component of a disturbance, such as velocity, 
amplitude, duration, etc. A similar approach after a minor modification can 
be applied for the detection of solitons described by other integrable nonlinear 
equations [21]. As a next step, we will apply the developed technique to the 
data of satellite observations of electromagnetic disturbances in the near-Earth 
environment. 
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Abstract. In this chapter we present a nonlinear enhancement of a linear 
method, the singular system analysis (SSA), which can identify potentially 
predictable or relatively regular processes, such as cycles and oscillations, in 
a background of colored noise. The first step in the distinction of a signal 
from noise is a linear transformation of the data provided by the SSA. In the 
second step, the dynamics of the SSA modes is quantified in a general, non¬ 
linear way, so that dynamical modes are identified which are more regular, or 
better predictable than linearly filtered noise. A number of oscillatory modes 
are identified in data reflecting solar and geomagnetic activity and climate 
variability, some of them sharing common periods. 

Keywords: signal detection, statistical testing, Monte Carlo SSA, sunspots, 
geomagnetic activity, NAO, air temperature, solar-terrestrial relations 


1 Introduction 

The quest for uncovering physical mechanisms underlying experimental data 
in order to understand, model, and predict complex, possibly nonlinear pro¬ 
cesses, such as those studied in geophysics, in many cases starts with an 
attempt to identify trends, oscillatory processes and/or other potentially de¬ 
terministic signals in a noisy environment. The distinction of a relatively reg¬ 
ular part of the total variability of a complex natural process can be a key for 
understanding not only such a process itself, but also interactions with other 
processes or phenomena, if they posses, for instance, oscillations on a similar 
temporal scale. 
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Singular system (or singular spectrum) analysis (SSA) [1, 2, 3] in its 
original form (closely related to the principal component analysis or Karhunen- 
Loeve decomposition) is a method for identification and distinction of 
important information in multivariate data from noise. It is based on an or¬ 
thogonal decomposition of a covariance matrix of multivariate data under 
study. The SSA provides an orthogonal basis onto which the data can be 
transformed, making thus individual data components (“modes”) linearly in¬ 
dependent. Each of the orthogonal modes (projections of the original data 
onto the new orthogonal basis vectors) is characterized by its variance, which 
is given by the related eigenvalue of the covariance matrix. Here, we will deal 
with a univariate version of SSA (which, however, can be generalised into a 
multivariate version, see, e.g. [4]) in which the analyzed data is a univari¬ 
ate time series and the decomposed matrix is a time-lag covariance matrix, 
i.e., instead of several components of multivariate data, a time series and 
its time-lagged versions are considered. This type of SSA application, which 
has frequently been used especially in the field of meteorology and clima¬ 
tology [5, 6, 7, 8, 9], can provide a decomposition of the studied time se¬ 
ries into orthogonal components (modes) with different dynamical properties. 
Thus, “interesting” phenomena such as slow modes (trends) and regular or ir¬ 
regular oscillations (if present in the data) can be identified and retrieved 
from the background of noise and/or other “uninteresting” non-specified 
processes. 

In the traditional SSA, the distinction of “interesting” components (sig¬ 
nal) from noise is based on finding a threshold (jump-down) to a “noise floor” 
in a sequence of eigenvalues given in descending order. This approach might 
be problematic if the signal-to-noise ratio is not sufficiently large, or the noise 
present in the data is not white but “colored”. For such cases, statistical ap¬ 
proaches utilizing the Monte Carlo simulation techniques have been proposed 
[6, 10] for reliable signal/noise separation. The particular case of Monte Carlo 
SSA (MCSSA) which considers the “red” noise, usually present in geophys¬ 
ical data, has been introduced by Allen & Smith [11]. In this chapter, we 
present and apply an extension of the Monte Carlo singular system analy¬ 
sis based on evaluating and testing the regularity of dynamics of the SSA 
modes. In our approach, we retain the decomposition exploiting the linear 
covariance structure of the data, however, in the testing (detection) part of 
the method, we evaluate the regularity of dynamics of the SSA modes using 
a measure of general, i.e., nonlinear dependence. The latter gives a clue in 
inferring whether the studied data contain a component which is more regu¬ 
lar and predictable, in a general, nonlinear sense, than linearly filtered noise. 
Attempts to generalize SSA-like approach to accounting for nonlinear depen¬ 
dence structures are also known [12, 13, 14, 15], however, are not considered in 
this chapter. 
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2 Monte Carlo Singular System Analysis 
and its Enhancement 

2.1 The Basic Univariate Singular System Analysis 

Let a univariate time series {y(i)}, i = 1,..., N 0l be a realization of a stochas¬ 
tic process {Y(*)| which is stationary and ergodic. A map into a space of n- 
dimensional vectors x(i) with components x k (i), where k = 1 ,..., n, is given 
as 

x k {i) = y(i + k- 1 ). ( 1 ) 

The sequence of the vectors x(«), i = 1 ,N = N 0 — (n — 1), is usually 
referred to as the n x N trajectory matrix X = {x k }, the number n of the 
constructed components is called the embedding dimension, or the length of 
the (embedding) window. Suppose that the n-dimensional time series (the tra¬ 
jectory matrix X) results from a linear combination of m different dynamical 
modes, m < n. Then, in an ideal case, the rank of the trajectory matrix X 
is rank(X) = m, and X can be transformed into a matrix with only m non¬ 
trivial linearly independent components. In the univariate SSA, it is supposed 
that this procedure decomposes the original series {y(i)} into a sum of several 
components and noise. Exceptional care must be taken when the trajectory 
matrix X is constructed from a time series possibly containing short-range 
correlated or nonlinear signals such as chaotic signals. The emergence of addi¬ 
tional, linearly independent modes when the lags used in construction of the 
trajectory matrix are larger than the correlation length of such a signal has 
been discussed in [16]. 

Instead of the n x TV matrix X, it is more convenient to decompose the 
symmetric n x n matrix C = X T X, since rank(X) = rank(C). The elements 
of the covariance matrix C are 

1 N 

°ki = ( 2 ) 

2=1 

where 1/A is the proper normalization and the components x k (i), i = 
1,..., N, are supposed to have zero mean. The symmetric matrix C can be 
decomposed as 

C = VXV T , (3) 

where V = {vtj} is an n x n orthonormal matrix, S = diag(cri, 02 ,..., <r„), 
er i are non-negative eigenvalues of C, by convention given in descending order 
< 7 i > <r 2 > • • • > a n . If rank(C)= m < n, then 


<Ti> ■■■ >CTm> cr m+1 = ■■■ =a n = 0. (4) 

In the presence of noise, however, all eigenvalues are positive, and the relation 
(4) takes the following form [17]: 
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01 > ’ ’ • > o m > a m+1 > ■■■ > <J n > 0. (5) 

Then, the modes 

n 

= '52 v ik x i, ( 6 ) 

1=1 

for k = 1 ,m are considered as the “signal” part, and the modes k = 
m + 1,..., n, are considered as the noise part of the original time series. The 
“signal” modes can be used to reconstruct the denoised signal xf as 

m 

Zi=J2 Vki & ( ? ) 

i=i 

Of course, the original time series x\ can be reconstructed back from the 
modes as 

n 

Xi=J2 Vki &- ( 8 ) 

i=i 

In the latter relation - decomposition ( 8 ), the modes can also be interpreted 
as time-dependent coefficients and the orthogonal vectors Vfc = {vu} as basis 
functions, usually called the empirical orthogonal functions (EOF’s). 

2.2 Monte Carlo Singular System Analysis 

The clear signal/noise distinction based on the eigenvalues ay, 02 ,..., a n can 
only be obtained in particularly idealized situation when the signal/noise ratio 
is large enough and the background consists of white noise. In many geophysi¬ 
cal processes, however, so-called “red” noise with power spectrum of the 1 //“ 
(power-law) type is present [11]. Its SSA eigenspectrum also has the 1 //“ 
character [18], i.e., the eigenspectrum of red noise is equivalent to a coarsely 
discretized power spectrum, where the number of frequency bins is given by 
the embedding dimension n. The eigenvalues related to the slow modes are 
much larger than the eigenvalues of the modes related to higher frequencies. 
Thus, in the classical SSA approach applied to red noise, the eigenvalues of 
the slow modes might incorrectly be interpreted as a (nontrivial) signal, or, on 
the other hand, a nontrivial signal embedded in red noise might be neglected 
if its variance is smaller than the slow-mode eigenvalues of the background red 
noise. Therefore, Allen & Smith [11] proposed comparing the SSA spectrum 
of the analyzed signal with the SSA spectrum of a red-noise model fitted to 
the studied data. Such a red-noise process can be modeled by using an AR(1) 
model (autoregressive model of the first order): 

u{i) — u = a{u{i — 1) — u) + 7 z(i), (9) 

where u is the process mean, a and 7 are process parameters, and z(i) is 
Gaussian white noise with zero mean and unit variance. 
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In order to correctly detect a signal in red noise, we will apply the following 
approach, inspired by Allen & Smith [11]: 

First, the eigenvalues are plotted not according to their values, but ac¬ 
cording to a frequency associated with a particular mode (EOF), i.e., the 
eigenspectrum in this form becomes a sort of a (coarsely) discretized power 
spectrum in general, not only in the case of red noise (when the eigenspectra 
have naturally this form, as mentioned above). 

Second, the eigenspectrum obtained from the studied data set is compared, 
in a frequency-by-frequency way, with the eigenspectra obtained from a set of 
realizations of an appropriate noise model (such as the AR(1) model (9)), i.e., 
an eigenvalue related to a particular frequency bin obtained from the data 
is compared with a range of eigenvalues related to the same frequency bin, 
obtained from the set of so-called surrogate data, i.e., the data artificially gen¬ 
erated according to the chosen noise model (null hypothesis) [11, 19, 20, 21]. 
Allen & Smith [11] also discuss other relevant approaches how to compare the 
eigenvalues from the tested data and the surrogates. 

The detection of a nontrivial signal in an experimental time series becomes 
a statistical test in which the null hypothesis that the experimental data were 
generated by a chosen noise model is tested. When (an) eigenvalue(s) associ¬ 
ated with some frequency bin(s) differ(s) with a statistical significance from 
the range(s) of related noise model eigenvalues, then one can infer that the 
studied data cannot be fully explained by the considered null hypothesis (noise 
model) and could contain an additional (nontrivial) signal. This is a rough 
sketch of the approach, for which we will use the term Monte Carlo SSA (MC- 
SSA), as coined by Allen & Smith [11] (see [11] where also a detailed account 
of the MCSSA approach with analyses of various levels of null hypotheses is 
given), although the same term was earlier used for other SSA methods, which 
considered a white noise background [6, 10]. 

2.3 Enhanced MCSSA: Testing Dynamics of the SSA Modes 

The MCSSA described above is a sophisticated technique. However, it still 
assumes a very simple model, i.e., that the signal of interest has been linearly 
added to a specified background noise. Therefore the variance in the frequency 
band, characteristic for the searched signal, is significantly larger than the 
typical variance in this frequency band obtained from the considered noise 
model. If the studied signal has a more complicated origin, e.g., when an 
oscillatory mode is embedded into a background process without significantly 
increasing variance in a particular frequency band, the standard MCSSA can 
fail. In order to be able to detect any interesting dynamical mode independent 
of its (relative) variance, Palus & Novotna [22] have proposed to test also 
dynamical properties of the SSA modes against the modes obtained from 
surrogate data. From this idea, the question arises how we can characterize 
dynamics in a simple, computationally effective way. 


332 


M. Palus, D. Novotna 


Consider a complex, dynamic process evolving in time. A series of mea¬ 
surements done on such a system in consecutive instants of time t = 1,2 ,... 
is usually called a time series {y(t)}. Consider further that the temporal evo¬ 
lution of the studied system is not completely random, i.e., that the state of 
the system at time t in some way depends on the state in which the system 
was at time t — r. The strength of such a dependence per unit time delay 
r, or, inversely, a rate at which the system “forgets” information about its 
previous states, can be an important quantitative characterization of tem¬ 
poral complexity in the system’s evolution. The time series {y(t)}, which 
is a record of (a part of) the system’s temporal evolution, can be consid¬ 
ered as a realization of a stochastic process, i.e., a sequence of stochastic 
variables. Uncertainty in a stochastic variable is measured by its entropy. 
The rate with which the stochastic process “produces” uncertainty is mea¬ 
sured by its entropy rate. The concept of entropy rates is common to the 
theory of stochastic processes as well as to information theory where the en¬ 
tropy rates are used to characterize information production by information 
sources [23]. 

Alternatively, the time series {y(t)} can be considered as a projection of 
a trajectory of a dynamical system, evolving in some measurable state space. 
A. N. Kolmogorov, who introduced the theoretical concept of classification of 
dynamical systems by information rates, was inspired by information theory 
and generalized the notion of the entropy of an information source [24]. The 
Kolmogorov-Sinai entropy (KSE) [24, 25, 26] is a topological invariant, suit¬ 
able for classification of dynamical systems or their states, and is related to 
the sum of the system’s positive Lyapunov exponents (LE) according to the 
theorem of Pesin [27]. 

Thus, the concept of entropy rates is common to theories based on philo¬ 
sophically opposite assumptions (randomness vs. determinism) and is ide¬ 
ally applicable for a characterization of complex geophysical processes, where 
possible deterministic rules are always accompanied by random influences. 
However, possibilities to compute the exact entropy rates from experimental 
data are limited to a few exceptional cases. Therefore Palus [28] has pro¬ 
posed “coarse-grained entropy rates” (CERs) instead. The CERs are rela¬ 
tive measures of regularity and predictability of analyzed time series and 
are based on coarse-grained estimates of information-theoretic functionals. 
In the simplest case, applied here, we use the so-called mutual information. 
The mutual information I(X\Y) of two random variables X and Y is given 
by I(X\Y) = H(X) + H(Y) — H(X,Y), where the entropies H(X), H(Y ), 
H (A, Y) are defined in the usual Shannonian sense [23]: 

Let X and Y be random variables with sets of values S' and T, respectively, 
probability distribution functions (PDF) p(x), p(y), and a joint PDF p(x,y). 
The entropy H(X) of a single variable, say X , is defined as 

H ( x ) = ~^2 1o § P( x )> 


( 10 ) 
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and the joint entropy H(X, Y ) of X and Y is 

H(X,Y) = - EE p{x,y) logp(x,y). (11) 

xe~ y er 


The mutual information /(X; Y ) then can be expressed as 

= ££>(*,,) log ( 12 ) 

x€Syer irw I 

A detailed account on relations between entropy rates and information- 
theoretic functionals is given in [28, 29]. For a time series {ir(f)}, consid¬ 
ered as a realization of a stationary and ergodic stochastic process |X(t)}, 
t = 1,2,3,..., we compute the mutual information I(x\x T ) as a func¬ 
tion of time lag r. We mark x(t) as x and x(t + r) as x T . For defin¬ 
ing the simplest form of CER let us find r mQX such that for r' > T max , 
I(x\ x T ’) ~ 0 for the analysed datasets. Then, we define the norm of the mutual 
information 


||/(x; x T )\\ 


At 


7max 


Tmin H” At 



I(x; x T ) 


(13) 


with T m i n = At = 1 sample as a usual choice. The CER h 1 is then defined as 

h 1 = I(x,x TQ ) - \\I{x;x T )\\. (14) 

It has been shown that the CER h 1 provides the same classification of states 
of chaotic systems as the exact KSE [28]. Since usually r 0 = 0 and I(x; x) = 
H{X) which is given by the marginal probability distribution p(x ), the sole 
quantitative descriptor of the underlying dynamics is the mutual information 
norm (13) which we will call the regularity index. Since the mutual information 
I{x ; x T ) measures the average amount of information contained in the process 
{X} about its future r time units ahead, the regularity index ||/(*; a: T )|| gives 
an average measure of predictability of the studied signal and is inversely 
related to the signal’s entropy rate, i.e., to the rate at which the system (or 
process) producing the studied signal “forgets” information about its previous 
states. 

There are plenty of approaches to estimate the mutual information I ( x ; x T ) 
[30]. If we are not interested in an exact value, but rather in a relative com¬ 
parison of values obtained from the tested data and from the surrogate set, a 
simple box-counting approach based on marginal equiquantization [21, 28, 29] 
is satisfactory. The latter means that the marginal boxes (bins) are not de¬ 
fined equidistantly, but in a such a way that there is approximately the same 
number of data points in each marginal bin. 
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2.4 Implementation of the Enhanced MCSSA 

We realize the enhanced MC SSA as follows: 

1. The studied time series undergoes SSA as briefly described above or in 
[31], i.e., using an embedding window of length n, the nxn lag-correlation 
matrix C is decomposed using the SVDCMP routine [32]. In the eigen- 
spectrum, the position of each eigenvalue on the abscissa is given by the 
dominant frequency associated with the related EOF, i.e., detected in the 
related mode. That is, the studied time series is projected onto the partic¬ 
ular EOF, the power spectrum of the projection (mode) is estimated, and 
the frequency bin with the highest power is identified. This spectral co¬ 
ordinate is mapped onto one of the n frequency bins, which equidistantly 
divide the abscissa of the eigenspectrum. 

2. An AR(1) model is fitted to the series under study, and the residuals are 
computed. 

3. The surrogate data are generated using the above AR(1) model, where 
“scrambled” (randomly permutated in temporal order) residuals are used 
as innovations, i.e., the noise term in (9) . 

4. Each realization of the surrogates undergoes SSA as described in step 
1. Then, the eigenvalues for the whole surrogate set are sorted in each 
frequency bin, and the values for the 2.5th and 97.5th percentiles are 
found. In the eigenspectra, the 95% range of the surrogates’ eigenvalue 
distribution is illustrated by a horizontal bar between the above percentile 
values. 

5. For each frequency bin, the eigenvalue obtained from the studied data is 
compared with the range of the surrogate eigenvalues. If an eigenvalue lies 
above the range given by the above percentiles, the null hypothesis of the 
AR(1) process is rejected, i.e., there is a probability p < 0.05 that such an 
eigenvalue as observed can emerge from the background of the null noise 
model. 

6 . For each SSA mode (a projection of the data onto a particular EOF), 
the regularity index is computed, as well as for each SSA mode for all 
the realizations of surrogate data. The regularity indices are processed 
and statistically tested in the same way as the eigenvalues. The regularity 
index is based on mutual information obtained by a simple box-counting 
approach with marginal equiquantization [21, 28, 29]. 

Performing MCSSA using the embedding window of the length n , there are 
n eigenvalues in the eigenspectrum, and n statistical tests are done. Therefore, 
in general, the problem of simultaneous statistical inference should be consid¬ 
ered (see [21] and references therein). However, in many relevant applications 
we are interested in a detection of a signal in a specific frequency band (and 
not in rejecting the null hypothesis by a digression from the surrogate range 
by an eigenvalue or a regularity index in any frequency band), therefore we 
will not discuss this topic here. 
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Rejecting the null hypothesis of the AR(1) (or another appropriate) noise 
model, one can infer that there is “something more” in the data than a re¬ 
alization of the null hypothesis (noise) model. The rejection based on the 
eigenvalues indicates a different covariance structure than the noise model 
used. The rejection based on the regularity index indicates that the studied 
data contains a dynamically interesting signal with higher regularity and pre¬ 
dictability than a mode obtained by linear filtration of the considered noise 
model. 


3 Numerical Examples 

3.1 A Signal in AR(1) Background 

For an example of the application of the presented approach, let us consider 
numerically generated data data - a periodic signal with randomly variable 
amplitude (Fig. la) mixed with a realization of an AR(1) process with a strong 
slow component (Fig. lb). The used noise model is defined as Xi = 0.933cci_i-F 
£j, where are Gaussian deviates with zero mean and unit variance. The 
signal to noise ratios (i.e., the ratios of the respective standard deviations of 



o 1000 2000 
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Fig. 1: Numerically generated test data: (a) A periodic signal with randomly variable 
amplitude was mixed with (b) a realization of an AR(1) process with a strong slow 
component, obtaining the signal to noise ratio 1:2 (c), and 1:4 (d). Adapted from 
Palus & Novotna [31]. 
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Fig. 2: The standard eigenvalue based (a-c) and the enhanced - regularity index 
based (d-f) MCSSA analysis of the numerical data, presented in Fig. 1. (a) The full 
eigenspectrum and (b) the low-frequency part of the eigenspectrum - logarithms of 
eigenvalues (“LOG POWER”) plotted according to the dominant frequency associ¬ 
ated with particular modes, for the signal to noise ratio 1:2. (c) Low frequency part 
of the eigenspectrum for the signal to noise ratio 1:4. (d) The regularity spectrum 
and (e) its low frequency part for the signal to noise ratio 1:2. (f) Low frequency part 
of the regularity spectrum for the signal to noise ratio 1:4. Bursts - eigenvalues or 
regularity indices for the analysed data; bars - 95% of the surrogate eigenvalues or 
regularity index distribution, i.e., the bar is drawn from the 2.5th to the 97.5th per¬ 
centiles of the surrogate eigenvalues/regularity indices distribution. Adapted from 
Palus & Novotna [31]. 








































































Detection of Oscillatory Modes 337 


signal and noise component) obtained by mixing the signals were 1:2 (Fig. lc), 
and 1:4 (Fig. Id). The latter two series are analyzed by the presented method. 

The eigenspectrum of the time series consisting of the signal (Fig. la) 
and the AR(1) noise (Fig. lb) in the ratio 1:2 (Fig. lc) is presented in 
Fig. 2a, where logarithms of the eigenvalues are plotted as the bursts (“LOG 
POWER”). The series is considered as unknown experimental data, so that 
an AR(1) model is fitted on the data and the surrogates are generated as 
described above. The vertical bars in the eigenspectrum represent the surro¬ 
gate eigenvalue ranges from 2.5th to 97.5th percentiles, which were obtained 
from 1500 surrogate realizations (here, as well as in the following example). 
The eigenvalues of the AR(1) surrogates uniformly fill all the n frequency 
bins (here, as well as in the following example, n = 100), while in the case of 
the test data, some bins are empty, others contain one, two, or more eigen¬ 
values. We plot the surrogate bars only in those positions, in which (an) 
eigenvalue(s) of the analyzed data exist(s). Note the 1//“ character of the 
surrogate eigenspectrum, i.e., the eigenvalues plotted against the dominant 
frequency associated with the related modes are monotonously decreasing in 
a 1//“ way. The low-frequency part of the eigenspectrum from Fig. 2a is 
enlarged in Fig. 2b. The two data eigenvalues related to the frequency 0.02 
(cycles per time unit) are clearly outside the range of those from the sur¬ 
rogates, i.e., they are statistically significant, the null hypothesis is rejected, 
and a signal not consistent with the null hypothesis is detected. A close look 
to the significant modes shows that they are related to the embedded signal 
from Fig. la, in particular, one of the modes contains the signal together with 
some noise of similar frequencies, and the other include an oscillatory mode 
shifted by n/2 relatively to the former one. Note that the simple SSA based 
on the mutual comparison of the data eigenvalues could be misleading, since 
the AR(1) noise itself “produces” two or three eigenvalues which are larger 
than the two eigenvalues related to the signal embedded in the noise. 

The same analysis applied to the series possessing the signal/noise ra¬ 
tio 1:4 (Fig. 2c), however, fails to detect the embedded signal — all eigen¬ 
values obtained from the test data are well confined between the 2.5th and 
97.5th percentiles of the surrogate eigenvalues distributions. Applying the test 
based on the regularity index to the mixture with the signal to noise ratio 
1:2 (Fig. 2d,e), for one data eigenvalue, the regularity index has been found 
significantly higher than the related surrogate indices. It was obtained from 
the mode related to the frequency bin 0.02, as in the case of the significant 
eigenvalues in Fig. 2a,b. This is the mode which contains the embedded sig¬ 
nal (Fig. la) together with some noise of similar frequencies. The orthogonal 
mode, related to the same frequency bin, which has a variance comparable to 
the former one (Fig. 2a,b), has its regularity index close to the 97.5th per¬ 
centile of the surrogate regularity indices distribution. With other words, if a 
(nearly) periodic signal is embedded in a (colored) noise background, the SSA 
approach, in principle, is able to extract this signal together with some noise of 
neighboring frequencies, and produces an orthogonal “ghost” mode which has 
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a comparable variance. However, its dynamical properties are closer to those 
of the modes obtained from the pure noise (null model), as measured by the 
regularity index (13). Nevertheless, the regularity index used as a test statis¬ 
tic in the MCSSA manner is able to detect the embedded signal with a high 
statistical significance in this case (signahnoise = 1:2), as well as in the case 
of the signal to noise ratio 1:4 (Fig. 2f), when the standard (variance-based) 
MCSSA failed (Fig. 2c). In the latter case, the orthogonal “ghost” mode did 
not appear, and the regularity index of the signal mode was lower than in the 
previous case, since the mode contains larger portion of the isospectral noise. 
However, the signal mode regularity index is still safely above the surrogate 
bar, i.e., significant with p < 0.05 (Fig. 2f). 

3.2 A Signal in Multifractal Background 

As a more complex example we “embed” the test signal (Fig. la) into a real¬ 
ization of a multifractal process (Fig. 3b) generated by a log-normal random 
cascade on a wavelet dyadic tree [33] using the discrete wavelet transform [32]. 
Using wavelet decomposition, we embed the most significant part of the signal 
(Fig. la) related to a particular wavelet scale - this wavelet-filtered signal is 
illustrated in Fig. 3a. The mixing is done in the space of wavelet coefficients. 



o 1000 2000 

TIME [SAMPLES] 


Fig. 3: Numerically generated test data: (a) The wavelet filtered signal from Fig. 
la was embedded into (b) a realization of a multifractal process, obtaining the 
ratio of related wavelet coefficients 1:2 (c), and 0.5:0.5 (d). Adapted from Palus & 
Novotna [31]. 
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In the first case (in Fig. 3 referred to as “signal added to multifractal”), the 
standard deviation (SD) of the signal wavelet coefficients is twice the SD of 
the wavelet coefficients of the multifractal signal in the related scale (Fig. 3c), 
i.e., the added signal deviates from the covariance structure of the “noise” 
(multifractal) process. In the second case, we adjusted the SD of both sets 
of wavelet coefficients to 50% of the SD of the wavelet coefficients of the 
original multifractal signal in the associated scale (Fig. 3d), so that the total 
variance in this scale (frequency band) does not exceed the corresponding vari¬ 
ance of the “clean” multifractal. Then, it is not surprising, that the variance- 
(eigenvalues)-based MCSSA test, using the AR(1) surrogate data (Fig. 4a,b), 
clearly distinguishes the signal from the multifractal background in the first 
case (Fig. 4a) including its orthogonal “ghosts”, while in the second case, no 
eigenvalue is outside the AR(1) surrogate range, but the slow trend mode 
(Fig. 4b). The AR(1) process is unable to correctly mimic the multifractal 
process - the slow mode (the zero frequency bin) scores as a significant trend 
over the AR(1) surrogate range, while the variance on subsequent frequencies 
is overestimated (Fig. 4a,b). On the other hand, even the AR(1) surrogate 
model is able to detect the added signal in the first case (Fig. 4a). If we use 
realizations of the same multifractal process as the surrogate data, the sig¬ 
nal is detected in the first case (not presented, just compare the bursts on 
frequency 0.02 in Fig. 4a and the related surrogate bar in Fig. 4c), while in 
the second case, the eigenvalues-based MCSSA neglects the signal embedded 
into the multifractal “noise” - all the data mixture eigenvalues (bursts) are 
inside the multifractal surrogate bars (Fig. 4c). In the MCSSA tests using 
the regularity index, the embedded signal is safely detected together with its 
orthogonal “ghosts” and higher harmonics not only in the first case (Fig. 4d), 
but also in the second case, either using AR(1) (Fig. 4e) or the multifractal 
surrogate data (Fig. 4f), when it is, from the point of view of the covariance 
structure, indistinguishably embedded into the multifractal process. 


4 Detection of Irregular Oscillations in Geophysical Data 

Temperature measurements are among the longest available instrumental data 
characterizing the long term evolution of the atmosphere and climate in a par¬ 
ticular location. For instance, the data from the Prague-Klementinum station 
are available since 1775. On the other hand, large-scale circulation patterns 
reflect a more global view on the atmospheric dynamics. The North Atlantic 
Oscillation (NAO) is a dominant pattern of atmospheric circulation variability 
in the extratropical Northern Hemisphere, accounting for about 60% of the 
total sea-level pressure variance. The NAO has a strong effect on European 
weather conditions, influencing meteorological variables including the temper¬ 
ature [34]. The NAO - temperature relationship, however, is not straightfor¬ 
ward and its mechanism is not yet fully understood. 

The possible influence of the solar variability on the climate change has 
been a subject of research for many years, however, there are still open 
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Fig. 4: The low frequency parts of the MCSSA eigenspectra (a-c) and regularity 
spectra (d-f) for the signal embedded into a multifractal process with wavelet coef¬ 
ficient ratio 1:2 (a,d) and 0.5:0.5 (b,c,e,f). Bursts - eigenvalues or regularity indices 
for the analysed data; bars - 95% of the surrogate eigenvalues or regularity index 
distribution obtained from the AR(1) (a,b,d,e) and the multifractal (c,f) surrogate 
data. Adapted from Palus & Novotna [31]. 


questions and unsolved problems (for reviews, see e.g. [35, 36, 37]). Proba¬ 
bly the longest historical record of the solar variability are the well-known 
sunspot numbers. After the sunspot numbers, aa index, the time series of the 
geomagnetic activity provides the longest data set of solar proxies which goes 
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back to 1868 [38]. Since there are no direct measurements of solar irradiance 
available until the beginning of the 1980s, the data of geomagnetic variations 
are used for an additional study of solar activity, especially of irradiance. 

It might be interesting if the atmospheric data, both the local and global, 
and the geomagnetic and solar data possess any common, repeating variability 
pattern such as cycles or oscillatory modes. The enhanced MCSSA can give 
an answer to such a question. 

4.1 The Data 

The NAO index is traditionally defined as the normalized pressure difference 
between the Azores and Iceland. The NAO data used here and their descrip¬ 
tion are available at http://www.cru.uea.ac.uk/cru/data/nao.htm. 

Monthly average near-surface air temperature time series from ten Euro¬ 
pean stations were used (see [31] for details), obtained from the Carbon Diox¬ 
ide Information Analysis Center Internet server (f tp: //cdiac. esd. ornl. gov/ 
pub/ndp04l) as well as a time series from the Prague-Klementinum station 
from the period 1781 - 2002. The long-term monthly averages were subtracted 
from the data, so that the annual cycle was effectively filtered out. 

The aa-index is defined by the average, for each 3-hour period, of the 
maximum of magnetic elements from two near-antipodal mid-latitude stations 
in Australia (Melbourne) and England (Greenwich). The data spanning the 
period 1868-2005 were obtained from World Data Centre for Solar-Terrestrial 
Physics, Chilton, http: //www.ukssdc . ac .uk/data/wdccl/wdc_menu.html. 

The monthly sunspot data, spanning the period 1749-2006, has been ob¬ 
tained from the SIDC-team, Royal Observatory of Belgium, Ringlaan 4, 1180 
Brussels, Belgium, http://sidc.oma.be/DATA/monthssn.dat. 

4.2 The Results 

Figure 5 presents the results from the enhanced MCSSA for the considered 
monthly NAO index and the monthly average near-surface air temperature 
time series from Prague (Prague-Klementinum station) and Berlin, obtained 
using the embedding dimension n = 480 months. In the standard MCSSA, 
the only eigenvalue undoubtedly distinct from the surrogate range is the trend 
(zero frequency) mode in the temperature (Fig. 5b,c). Further, there are two 
modes at the frequency 0.0104 just above the surrogate bar in the Prague 
temperature and NAO test (Fig. 5a,b). These results, however, are still “on 
the edge” of significance and are not very convincing. In the case of Berlin, 
the eigenvalues of the modes at the frequency 0.0104 are confined within the 
surrogate range (Fig. 5c). 

A quite different picture is obtained from the analyses based on the regu¬ 
larity index (Figs. 5d,e,f). Several oscillatory modes have been detected with 
a high statistical significance. The distinction of the regularity indices of these 
modes from the related surrogate ranges is clear and even the simultaneous 
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Fig. 5: Enhanced MCSSA analysis of the monthly NAO index (a,d) and monthly 
average near-surface air temperature series from Prague-Klementinum (b,e) and 
Berlin (c,f). Low-frequency parts of eigenspectra - logarithms of eigenvalues (“LOG 
POWER”) (a,b,c) and regularity index spectra (d,e,f). Bursts - eigenvalues or reg¬ 
ularity indices for the analysed data; bars - 95% of the surrogate eigenvalues or 
regularity index distribution, i.e., the bar is drawn from the 2.5th to the 97.5th 
percentiles of the surrogate eigenvalues/regularity indices distribution. The datasets 
span the period 1824-2002, the embedding dimension n = 480 months was used. 


statistical inference cannot jeopardize the significance of the results. The sig¬ 
nificant modes in the NAO are located at the frequencies (in cycles per month) 
0.004, 0.006, 0.0104, 0.014, 0.037 and 0.049, corresponding to the periods of 
240, 160, 96, 73, 27 and 20 months, respectively. Besides the zero frequency 
(trend) mode, the significant modes in the Prague temperature are located 
at the frequencies 0.0104, 0.014, 0.016, 0.018, 0.025, 0.037 and 0.051, corre¬ 
sponding to the periods of 96, 68, 64, 56, 40, 27 and 20 months, respectively. 
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In the case of Berlin, there are some differences, namely the modes with the 
periods 20, 40, 56 and 64 months are missing, while modes with periods 23, 
29 and 58 months, as well as a slow mode next to the zero frequency mode 
appeared. The significant modes with the periods 27, 68 and 96 months were 
detected in both the records. 

The modes with a period of 8 years were extracted and analysed in [31], 
their mean frequency was estimated with higher precision as 7.8 years. Besides 
the latter modes (and the trend mode in the temperature), the highest regu¬ 
larity index was obtained for the modes with a period of 27 months (frequency 
0.037). This frequency lies within the range of the quasi-biennial oscillations 
(QBO). The behavior of these modes was studied in some detail in [39]. 

The results of the enhanced MCSSA analysis of the aa index are pre¬ 
sented in Fig. 6. In the standard (eigenvalue) analysis (Fig. 6a), we can see 
significant modes representing the trend, i.e., the zero frequency mode, and 
a mode with a frequency of 0.0073 which corresponds to the period of 136 
months, i.e. to the 11-year solar activity cycle. The analysis based on the reg¬ 
ularity index (Fig. 6b) confirms the previous two modes and adds two more 
ones on frequencies of 0.0104 and 0.016, corresponding to periods of 96 and 
64 months. 

The mode with the period of 96 months or 8 years has been detected in 
all the above analyzed data sets, i.e., in the near-surface air temperature, 
the NAO index, and the geomagnetic aa index. The time series of the modes 
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Fig. 6: Enhanced MCSSA analysis of the monthly aa index. The low-frequency part 
of the eigenspectrum - logarithms of eigenvalues (“LOG POWER”) (a) and the 
regularity index spectrum (b). Bursts - eigenvalues or regularity indices for the 
analysed data; bars - 95% of the surrogate eigenvalues or regularity index distribu¬ 
tion, i.e., the bar is drawn from the 2.5th to the 97.5th percentiles of the surrogate 
eigenvalues/regularity indices distribution. The dataset spans the period 1868-2005, 
the embedding dimension n = 480 months was used. 
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extracted using the SSA, i.e., by projecting the input data on the particular 
EOF, are presented in Fig. 7a,c,e. When the modes are extracted using SSA, 
there is an uncertainty of timing of the modes given by the embedding window, 
and a part of the data equal to the embedding window is lost. We positioned 
the SSA modes on the time axis by maximizing the cross-correlation between 
the mode and the original data. This approach, however, does not always give 
unambiguous results. Therefore, Palus & Novotna [39] studied the possible 
relationships of the QBO modes from the temperature and NAO index not 
only using the SSA-extracted modes, but also using modes extracted from 
the data by means of complex continuous wavelet transform (CCWT) [40]. 
Here we compare the modes with the period 96 months extracted by SSA 
(Fig. 7a,c,e) with the modes obtained by using CCWT with the central wavelet 
frequency set to the period of 96 months (Fig. 7b,d,f). The SSA mode and 
the wavelet mode, obtained from the Prague temperature data (Fig. 7a,b, 
respectively) are shifted by n (a half of the period), otherwise their agreement 
is very good. The timing of the SSA and CCWT modes from the NAO index 
(Fig. 7c,d, respectively) is consistent, however, the wavelet transform performs 
stronger smoothing. In the aa index, the CCWT mode is smoother and slightly 
shifted in time in comparison with the SSA mode (Fig. 7f,e, respectively). 
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TIME [YEAR] 


Fig. 7: The oscillatory modes with the approximately 8-year period extracted by 
using SSA (a,c,e) and CCWT (b,d,f) from the near-surface air temperature (a,b), 
the NAO index (c,d), and the aa index (e,f). 
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Analyzing the monthly sunspot data, the only clear significance in both the 
eigenspectrum (Fig. 8a) and the regularity index spectrum is the mode with 
a period of 136 months. The long-term trend at the zero-frequency mode 
lies at the edge of significance (Fig. 8a). After removal of the 136 month 
mode and subsequent analysis of the data residuals, the zero frequency mode 
becomes highly significant and another slow mode, with a period about 80 
years emerges. Two new significant modes, related to the 11-year solar cycle 
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Fig. 8: Enhanced MCSSA analysis of the monthly sunspot data. Low-frequency parts 
of eigenspectra - logarithms of eigenvalues (“LOG POWER”) for the raw sunspot 
data (a), the sunspot data after removal of the mode with the period 136 months 
(b), and for the sunspot data after removal of the modes with the periods 136, 120 
and 106 months (c). (d): Low-frequency part of the regularity index spectrum for the 
sunspot data after removal of the modes with the periods 136, 120 and 106 months. 
Bursts - eigenvalues or regularity indices for the analysed data; bars - 95% of the 
surrogate eigenvalues or regularity index distribution, i.e., the bar is drawn from 
the 2.5th to the 97.5th percentiles of the surrogate eigenvalues/regularity indices 
distribution. The dataset spans the period 1749-2006, the embedding dimension 
n = 480 months was used. 
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appear on the frequency bins following the frequency bin of the previously 
defined mode with the period 136 months. Their periods are 120 and 106 
months (Fig. 8b). After removal of all three modes (i.e., the modes with the 
periods 136, 120 and 106 months) which can be considered as a decomposi¬ 
tion of the 11-year cycle, the standard MCSSA analysis of the sunspot data 
residuals uncovers another interesting oscillatory mode in the frequency bin 
corresponding to a period of 7.4 years (Fig. 8b). The enhanced MCSSA anal¬ 
ysis of the sunspot data residuals confirms all the modes from the standard 
MCSSA (zero frequency and period 80 and 7.4 years) and adds two new sig¬ 
nificant modes with the periods of 43.5 and 26 months (Fig. 8d). 

It is important to note that the frequency or period accuracy of the SSA 
approach is limited by the number of frequency bins given by the embedding 
dimension. The accuracy of the frequency (or the period) of a particular mode 
can be increased after the extraction of this mode from the original data and 
its subsequent spectral or autocorrelation analysis, as Palus & Novotna [22, 31] 
have done for the temperature mode. On the other hand, oscillatory modes 
from natural processes are never strictly periodic and their frequency is vari¬ 
able. We illustrate this variability by presenting histograms of instantaneous 
frequencies of the two close modes - the mode with the period 7.8 yr from 
the Prague temperature (Fig. 9a), and the period 7.4 yr mode obtained from 
the sunspot data residuals after modes related to the 11 yr cycle have been 
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Fig. 9: Histograms of the instantaneous frequencies of the 7.8 yr temperature mode 
(a) and the 7.4 yr sunspot mode (b). The thin vertical lines mark the frequencies 
corresponding to the period of 8 and 7 years, reading from the left to the right side. 
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previously removed (Fig. 9b). The instantaneous frequencies were obtained 
by differentiation of the instantaneous phases [41, 42]. The latter can easily 
be computed by applying the analytic signal approach to the two orthogo¬ 
nal (shifted by n/2) components of each oscillatory mode, see Refs. [39, 43] 
for details. Thus the presented histograms are not necessarily equivalent to 
the power (Fourier) spectra, but they better reflect possibly nonstationary 
fluctuations of the frequencies of the modes. We can see that the most prob¬ 
able period of the sunspot mode is 7.4 years, with the slight tendency to 
higher frequencies (Fig. 9b), while in the case of the temperature mode, the 
most probable period is 7.8 years, with considerable weight on slower fre¬ 
quencies (Fig. 9a). There is, however, a great deal of common frequencies of 
the two modes, giving thus the possibility of interactions during some time 
intervals. 

Considering both the available accuracy and the natural variability of the 
frequency of the detected oscillatory modes, the periods given here should 
be understood as limited accuracy estimates of average periods of particular 
modes. 

The common occurrence of the oscillatory modes with the periods of ap¬ 
proximately 11, 5.5, and 2.2 years and in the range 7-8 years in the sunspot 
numbers, the aa index, the near-surface air temperature and the NAO index 
is summarized in Table 1. 

Table 1: Occurrence of the most significant oscillatory modes with periods of ap¬ 
proximately 11, 7-8, 5.5 and 2.2 years in the sunspot numbers, the aa index, the 
average near-surface air temperature and the NAO index. 


Source 


Period [years] 
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We can see that the modes with a period in the range 7-8 years have been 
detected in all the analysed datasets. These modes, obtained from the near¬ 
surface air temperature, from the NAO index and the geomagnetic aa index 
have already been presented in Fig. 7, the related modes from the sunspot 
data are illustrated in Fig. 10. Again, we can compare the mode extracted 
by SSA in the natural EOF base (Fig. 10a) with the modes obtained by 
CCWT with the Morlet basis [40], using two close central wavelet frequencies 
corresponding to the periods 8 yr (Fig. 10b) and 7.4 yr (Fig. 10c). We can see 
that the wavelet extracted modes have a more limited frequency range and 
the wavelets with different central frequency are able to better fit the mode 
shapes in different temporal segments dominated by different frequencies. 
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Fig. 10: The oscillatory mode with the approximately 7.4 yr period obtained from 
the sunspot data residuals after previously removed modes related to the 11 yr 
cycle, extracted by using SSA (a) and CCWT with the central wavelet frequency 
corresponding to periods 8 yr (b) and 7.4 yr (c). 


5 Discussion and Conclusion 

The Monte Carlo Singular System Analysis has been extended by evaluat¬ 
ing and testing the regularity of the dynamics of the SSA modes against 
the colored noise null hypothesis in addition to the test based on variance 
(eigenvalues). The nonlinear approach to the measurement of regularity and 
predictability of the dynamics, based on a coarse-grained estimate of the mu¬ 
tual information, gives a possibility to detect dynamical modes which are 
more regular than those obtained by decomposition of colored noise. Using 
numerical examples, we have demonstrated that such an enhanced MCSSA 
test is more sensitive in detection of oscillatory modes hidden in a noisy back¬ 
ground. There are, however, some facts about accuracy and consistency of 
the results which should not be neglected. Already in the previous section, 
we have discussed the accuracy of the estimation of the period of detected 
oscillatory modes. We have stated that we are only able to provide a limited 
accuracy estimate of an average period or frequency, since the frequency of 
oscillatory modes in the studied natural phenomena is variable. One should 
keep this fact in mind in comparisons of results found in the literature. Not 
only frequency, but also the relative variance and the regularity of the oscilla¬ 
tory modes is variable. Due to this nonstationary behaviour, any conclusion 




























Detection of Oscillatory Modes 349 


about the existence and significance of a mode is dependent on the temporal 
range of analysed data. Obtained eigenvalues and regularity indices give an 
average quantification of the relative variance and regularity, respectively, for 
the analysed time span of the data. It is possible that in some data segments, 
the results can change. Thus it is reasonable to combine the MCSSA analysis 
with a wavelet analysis, using the latter one as an exploratory tool and the 
former one as a hypothesis testing tool. 

Another important question is that of the relevance of the used null hy¬ 
pothesis. While in many cases the simple AR(1) process seems to work satis¬ 
factorily, for instance, in the case of the sunspot numbers, it is not generally 
appropriate. In this case, the AR(1) process does not fit the long-range de¬ 
pendence in the data, but the short-range correlation inside the llyr cycle. 
As a consequence, the covariance structures of the data and the null noise 
model are not consistent (see Fig. 8a,b,c where the surrogate bars overesti¬ 
mate the data eigenvalues). The situation is improved after removal of the 
modes related to the llyr cycle, and especially, in the case of the regularity 
test, the null hypothesis seems to be consistent with the noise part of the data 
(Fig. 8d). In the further development of the MCSSA, it is desirable to con¬ 
sider also more sophisticated null hypotheses including long-range correlated, 
fractal and multifractal models, since such properties have been observed in 
geophysical data, especially in the long-term air temperature records [44, 45]. 

The enhanced MCSSA has been applied to records of monthly average 
near-surface air temperature from several European locations, to the monthly 
NAO index, as well as to the monthly aa index and the sunspot numbers. A 
number of significant oscillatory modes have been detected in all the different 
source data, some of them with common periods (Table 1). While the llyr 
solar cycle is shared by the solar and geomagnetic data, the quasi-biennial 
mode is present in the atmospheric data and also in the solar data. The mode 
with the period in the range 7-8 years is present in all the analysed data, i.e., 
in the atmospheric temperatures, in the NAO index, in the aa index and in 
the sunspot numbers. 

It is interesting to note that the oscillatory mode with a period of 7.8 
years has been detected in the NAO, in the Arctic Oscillation (AO), in the 
Uppsala winter near-surface air temperature, as well as in the Baltic Sea 
ice annual maximum extent by Jevrejeva and Moore [46]. Applying MCSAA 
on the winter NAO index, Gamiz-Fortis et al. [47] detected oscillations with 
the period 7.7 years. Moron et al. [48] have observed oscillatory modes with 
the period about 7.5 years in the global sea surface temperatures. Da Costa 
and de Verdiere [49] have detected oscillations with the period 7.7 years in 
interactions of the sea surface temperature and the sea level pressure. Unal 
and Ghil [50] and Jevrejeva et al. [51] observed oscillations with periods 7-8.5 
years in a number of sea level records. Feliks and Ghil [52] report the significant 
oscillatory mode with the 7.8 year period in the Nile River record, Jerusalem 
precipitation, tree rings and in the NAO index. Our first application of the 
enhanced MCSSA [22] yielded the observation of the mode with the period 
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7.8 years in near-surface air temperature from several European locations. 
Recently, the enhanced MCSSA analyses of the temperature data were refined 
and the analysis of the NAO index was added [31]. In the present work the 
number of processes containing the oscillatory mode with the period in the 
range 7-8 years was extended by the geomagnetic activity aa index and the 
sunspot numbers. 

These findings give a solid basis for further research of relations among 
the dynamics reflected in the analysed data and thus between the solar and 
geomagnetic activity and the climate variability. The existence of oscillatory 
modes open the possibility to apply the recently developed synchronization 
analysis [53, 54] which already has found successful applications in studies of 
relations between atmospheric phenomena. Maraun & Kurths [55] discovered 
epochs of phase coherence between El Nino/Southern Oscillation and Indian 
monsoon, while Palus & Novotna [39] demonstrated phase synchronization or 
phase coherence between the above mentioned QBO modes extracted from the 
temperature and the NAO index. The analysis of instantaneous phases of oscil¬ 
latory processes allows to detect very weak interactions [53] and also causality 
relations if one oscillatory process drives the other one [56, 57]. In such analy¬ 
sis, Mokhov & Smirnov [58] have demonstrated that the NAO interacts with 
(or is influenced by) the other main global atmospheric oscillatory process - 
the El Nino Southern Oscillation. We believe that the synchronization analy¬ 
sis will help uncovering the mechanisms of the tropospheric responses to the 
solar and geomagnetic activity and contribute to a better understanding of 
the solar-terrestrial relations and their role in climate change. 
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Reik Donner 

Institute for Transport and Economics, Dresden University of Technology, 
Andreas-Schubert-Str. 23, 01062 Dresden, Germany, 
e-mail: donner@vwi.tu-dresden.de 

Abstract. Coherent or synchronous motion of oscillatory components is a 
feature of many geoscientific systems. In this work, we review and compare 
different possible approaches to detect and quantify the phase coherence be¬ 
tween time series of oscillatory systems. In particular, methods originated in 
the theory of phase synchronisation phenomena and the concept of recurrence 
plots are considered. As a particular example, the sunspot activity on both 
solar hemispheres and the corresponding phenomenon of north-south asym¬ 
metry are studied. It is shown that this asymmetry can be understood in terms 
of a different “phase diffusion” of two coupled chaotic oscillators, which do 
however evolve coherently in time. The statistical reliability and implications 
of this result are discussed. Apart from the particular problem of sunspot ac¬ 
tivity, the methods described in this chapter may be used to study a variety 
of other phenomena in geoscientific systems, for example, the coherent motion 
of certain atmospheric oscillation patterns. 

Keywords: Decadal-scale variability, phase coherence analysis, wavelet 
analysis, sunspots, north-south asymmetry 


1 Introduction 

Many solar and geo-physical processes are characterised by coherent oscilla¬ 
tory components in their dynamics. In particular, the solar activity on decadal 
time scales is clearly dominated by the so-called Schwabe cycle with an av¬ 
erage period of about 11 years, which can be observed in terms of indicators 
like sunspot numbers, flare activity, or total solar flux. Any of these “sunspot 
cycles” is accompanied by a reversal of the polarity of the solar magnetic 
field, which means that the magnetic cycle of the Sun is dominated by a 
roughly 22-years period (Hale cycle). Detailed analyses of recent observations 
additionally indicate the presence of other distinct periodicities in the solar 
activity, ranging from short periods [1] to long-term components [2, 3, 4] like 
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the Gleissberg (period of about 80-100 years), Suess/de Vries (210 years), and 
Hallstatt (2300 years) cycles. 

In general, it is known that on longer time scales, the quasi-regular 
Schwabe cycle is modulated by long-term fluctuations which affect both 
its amplitude and frequency. Observational data on sunspot numbers which 
record these variations are continuously available since the mid of the 19th 
century, and have been roughly reconstructed from distinct historical obser¬ 
vations as well as climatological sources with a remarkably high temporal 
resolution for the last millenium. These reconstructions show that there were 
distinct periods of rather weak solar activity [5, 6], known as the Dalton 
(approx. 1790-1820 AD), Maunder (1645-1715 AD), Sporer (1420/50-1550/70 
AD), Wolf (1280-1350 AD), and Oort (1040-1080 AD) minima. Most of these 
minima have been associated with certain climatic conditions on the Earth, 
for example, very cold winters in Europe during the so-called “little ice age” 
(showing its first climatic minimum at about 1650 AD) which coincides well 
with the Maunder and Dalton minima. Even for the time before 1000 AD, 
distinct historical sources [7] allow to determine time intervals of extraordi¬ 
narily strong solar activity. Moreover, the observation that low solar activity is 
accompanied by a reduced net irradiation on the Earth surface and has there¬ 
fore signatures in the climate system has motivated reconstructions based on 
high-resolution climate archives like tree rings or sediment as well as ice core 
records, which give indirect information about the activity during the past 
millenia. On even longer time scales, the abundance of certain cosmic iso¬ 
topes (for example, 10 Be) in ice cores can be used to trace variations of the 
solar activity. 

The variations of solar activity are known to trigger not only the long-term 
climate change itself. It is also known that there is a distinct feedback with 
the geomagnetic field, which itself influences the climate on large time scales. 
For the decadal-scale variability of the Sun (i.e., the “sunspot” cycle), various 
authors have reported that its signatures can be found in different parts of 
the climate system, from the lower troposphere [8] to the stratosphere [9], 
surface temperatures [10, 11, 12] and the precipitation activity and resulting 
lake-level standings in Central Africa [13]. Recently, it has been suggested that 
the hemispheric asymmetry of decadal-scale solar activity may have a certain 
importance for the Earth’s atmospheric circulation [14]. In contrast to these 
findings, Moore et al. [15] have shown that major atmospheric oscillation 
patterns like the Quasi-Biennial Oscillation (QBO), the Arctic Oscillation 
(AO), and the El Nino Southern Oscillation (ENSO) are not directly linked 
to the Schwabe cycle. In addition, recent studies have proven that the present 
global warming cannot be attributed to a gradual increase of solar activity, 
which follows from the too small amplitude of solar activity variations [16] 
and an opposite direction of both signals [17] during the last decades. 

In this chapter, some techniques will be reviewed that allow to trace the 
dynamic signatures of oscillatory variability on the Sun in different indicators 
as well as the Earth’s climate system. The key feature to be studied is the 
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phase coherence of distinct oscillatory signals. It has to be mentioned that 
efforts have been made to also perform such studies without pronounced pe¬ 
riodic signals (for example, in the case of the relationship between the El 
Nino activity and the strength of the Indian monsoon [18, 19, 20]), however, 
these efforts have not yet been fully convincing. In Sect. 2 of this chapter, 
some methods are introduced that have been recently suggested for investi¬ 
gating phase synchronisation or, more general, phase coherence phenomena in 
coupled oscillating systems. In Sect. 3, the dynamic features of the solar ac¬ 
tivity are studied in some detail, with a special emphasis on its decadal-scale 
variability. The asymmetry of both solar hemispheres is discussed in Sect. 4, 
whereas all results are summarised and discussed in Sect. 5. 


2 Phase Coherence Analysis 

During the past about 20 years, there has been an increasing interest in the 
study of synchronisation phenomena between coupled oscillatory systems in 
nature and society [21]. In general, the term synchronisation refers to a process 
of mutual adjustment of oscillations of two distinct, but coupled systems, 
leading from a non-coherent to a coherent motion of the oscillators, which may 
be periodic, quasi-periodic, or chaotic. Even without explicit oscillations, one 
may understand the emergence of a coherent motion of two coupled systems as 
a synchronisation phenomenon (generalised synchronisation). If the coupling 
between the systems is not bivariate, it is however more reasonable to speak 
of a locking instead of synchronisation. 

Among the different types of synchronisation phenomena, the emergence 
of phase synchronisation, i.e., a coherent motion with a fixed ratio of average 
frequencies, is particularly relevant for the understanding of many situations. 
However, for a synchronisation phenomenon in the strict sense, a clear dis¬ 
tinction between two coupled systems is required. If this is not the case (for 
example, when studying two observables of the same system), one should 
rather speak of phase coherence between the considered oscillations. In the 
context of time series analysis, this may lead to an identification problem: 
If there is no sufficient information about the particular structure of an ob¬ 
served system, a phase-coherent motion can be interpreted in terms of two 
coupled self-sustained oscillators (i.e., phase synchronisation), two compo¬ 
nents or observables of one oscillatory system, one observable viewed through 
two (nonlinear) observation functions, etc. 

In the following, different approaches to quantify the degree of phase coher¬ 
ence or phase synchronisation based on time series analysis will be reviewed. 
For the presented methods, sufficiently stationary conditions are assumed, i.e., 
temporal variations in the presence or degree of phase coherence are not explic¬ 
itly considered. In order to study instationary phase coherence, the presented 
approaches can be applied in terms of a piecewise analysis of the observations, 
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given a sufficiently high temporal resolution compared to the time-scale on 
which the corresponding changes occur. 


2.1 Phase Definition and Phase Coherence 


The classical approach of studying the coherent motion of coupled oscillatory 
systems considers the spectral coherence, i.e., the existence of oscillations 
with the same frequency in two or more time series. However, this approach 
requires stationary conditions, i.e., the presumed frequency must contribute 
at all times with equal strength. In real-world systems (in particular, in the 
geosciences, but also in physiological systems), this assumption is usually vio¬ 
lated, which calls for time-sensitive generalisations like wavelet coherence. In 
contrast to these “frequency coherence” concepts, phase coherence (or phase 
synchronisation) analysis refers to the instantaneous frequency of oscillations 
(i.e., the time derivative of a suitably defined phase variable), which may 
change with time. This generalisation allows to study also the joint behaviour 
of rather complex systems like chaotic oscillators. 

According to the above mentioned paradigm of instantaneous frequen¬ 
cies, traditional phase coherence analysis is based on the proper definition of 
phases. For this problem, there is a variety of different approaches: 

• Poincare sections of the (possibly embedded) time series may be used to 
define points in time that correspond to fixed phase values 2fc7r with k £ N 
[21]. Between these points, the phase variable is defined via interpolation, 
for example, using piecewise linear functions. 

• If there is a pronounced oscillation (whose frequency and amplitude may 
still vary with time), the analytic signal approach has become a standard 
way of defining a phase variable. In this framework, the Hilbert transform 

no - ± ™ r ^ * a) 

^ J — OO ^ T 


is used to continue a scalar signal X into the complex plane (Z = X +iY) 
[22], where a phase variable can be easily assigned as 


(j){t) = arctan 



( 2 ) 


• In order to make the analytic signal approach being applicable to noisy or 
slightly non-coherent oscillators as well, it has been shown that a geometric 
phase definition based on the local derivatives of the time series X and its 
Hilbert transform Y might be helpful (curvature method) [23, 24, 25] 



4>(t) = arctan 


(3) 
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• As a generalisation of the frequency coherence approach, one may use 
a wavelet transform [26, 27, 28, 29, 30, 31] or other methods of time- 
frequency analysis like the Gabor transform [32] to select the (temporally 
variable) strength of oscillations on a fixed reference frequency in terms of 
both amplitude and phase. If the frequency of the dominating oscillation 
pattern varies significantly itself, one may also consider these variations 
using the phases belonging to the frequency with the largest local oscilla¬ 
tion amplitude. As a further alternative, coherent oscillatory modes with 
possibly variable frequencies can be selected using empirical mode decom¬ 
position [33, 34], bandpass filtering in the Fourier space [35], independent 
component analysis [36] or other suitable methods, before being used for 
defining a phase variable by the analytic signal approach. 


Having thus defined the phase variables for two time series, one may com¬ 
pare the joint evolution of the phases of both systems. In general, the phases 
increase with an average rate corresponding to the average frequency of the 
systems. If this average frequency is the same in both systems, their phase 
difference Sfat) = fait) — fait) is bounded. In a more general way, one may 
define m : n phase synchronisation if the average frequencies of both systems 
have the ratio m : n with m and n being integers. Remaining phase differences 
correspond to different fluctuations of the instantaneous frequencies around 
their average values, corresponding to the so-called phase diffusion. This term 
is motivated by the fact that for chaotic oscillators, the dynamics of the phase 
after substracting a linear increase according to the average frequency may 
resemble a stochastic diffusion process [21]. 

According to these general considerations, measures of phase coherence 
may be defined based on either the statistical properties of the phase dif¬ 
ferences or the joint evolution of the phases. Whereas the latter approach 
is realised in terms of the mutual information between the phase variables 
[37], suitable statistical properties of the phase differences are their standard 
deviation, normalised Shannon entropy [38], or the Rayleigh measure (mean 
resultant length) 


r 


; 

N 


N 

i —1 


(4) 


where N is the total length of the considered time series. As the last quantity 
has a direct interpretation in terms of directional statistics, it is convenient 
to use it as a corresponding measure in all further analyses. It has to be 
mentioned that the power of the different statistics may be very different. In 
order to statistically test for the presence of phase synchronisation, bootstrap 
approaches have been proposed [39]. However, although (frequency) and phase 
coherence methods are usually sensitive, it has been shown that they are not 
specific in distinguishing between coupled self-sustained oscillators and time 
series being connected by a certain transfer function [40], which is closely 
related to the above mentioned identification problem. 
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2.2 Recurrence Plots: Phase Coherence Analysis Without Phases 

As an alternative to the “traditional” phase synchronisation analysis described 
above, one may use a topological approach which is based on the concept of 
recurrence plots [41, 42]. This method has originally been designed as a tool 
for visualising the correlation pattern within a single time series comparing 
the values at all times i,; with all observations at other times tj. A simple 
graphical representation is obtained by comparing the difference between the 
distances of every pair of values and a prescribed threshold value e, which 
is then binarily encoded according to the corresponding order relationship. 
Mathematically, this leads to the so-called recurrence matrix 

R ij =G(e-\\X i -X j \\) (5) 

(where 6>(-) is the Heaviside function, Xi = X(ti), and || ■ || some suitable 
norm), which depends on the specific choice of e. Note that in order to enhance 
the meaning of non-zero entries as representatives of a similar dynamics, in 
Eq. (5), the time series X is often replaced by a suitably embedded version of 
itself. The graphical visualisation of R in terms of a black-white structure is 
called a recurrence plot. As a particularly remarkable feature, it follows from 
the definition that Ru = 1 independent of e. The corresponding main diagonal 
structure in the plot is called the line of identity (LOI). Besides the intuitive 
interpretation of the emerging structures in a recurrence plot in terms of a 
similar dynamics, statistics on the distributions of continuous vertical and 
diagonal structures allow to estimate a variety of dynamic invariants [42]. 

The concept of recurrence plots may be extended to study the joint evo¬ 
lution of two coupled systems. If X and Y are two time series reflecting the 
dynamics of the same physical quantity, cross-recurrence plots are defined 
as [43] 

CR ij = G(e-\\X i -Y j \\). (6) 

In contrast to univariate recurrence plots, the line of identity is not anymore 
present in cross-recurrence plots. However, if the dynamics of X and Y is topo¬ 
logically similar, other continuous structures emerge as both considered times 
ti and tj increase. The corresponding structure next to the main diagonal, the 
line of synchronisation (LOS) [44], can be understood as a representative of 
the phase difference between the two systems. 

Recurrence plots may be used to detect different types of synchronised 
dynamics. In particular, signatures of phase synchronisation may be isolated 
without explicitly defining a phase variable [45, 46]. For this purpose, one 
considers the diagonal-wise recurrence rate 

N — t 

P X (r) = - E 9(e- \\X t - X i+T \\), (7) 

i—1 

which may be understood as a generalised auto-correlation function of the 
time series X [47, 48]. One may convince oneself that the presence of phase 
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synchronisation between two time series A' and Y requires that the resulting 
generalised auto-correlation functions show the same behaviour. According to 
this, the correlation coefficient between these two functions, 

CPR=(p x (t)p y (t)) t (8) 

(where p stands for the function p standardised to zero mean and unit vari¬ 
ance), can be used as an indictor of phase synchronisation. In order to test the 
significance of this measure, the concept of twin surrogates is used. Here, one 
of the original time series is used to successively generate independent, topo¬ 
logically consistent replications of the time series within which random points 
are replaced by others with the same local neighborhood, i.e., the same col¬ 
umn in the recurrence plot [49]. Bootstrapping one of the systems in this way 
and computing the CPR indices with respect to the original time series of the 
other system for every realisation, one approaches a distribution of index val¬ 
ues. Comparing the CPR value obtained from the original two systems with 
this distribution, one may conclude about the statistical significance of CPR 
and, hence, about the presence of phase synchronisation. Note that a signif¬ 
icant difference requires the presence of remaining phase differences between 
both systems, i.e., a different “phase diffusion”. If this is not the case (for 
example, in the case of complete synchronisation), it is not anymore possible 
to use this approach to test for the presence of phase synchronisation. 

2.3 Example: Two Coupled Rossler Oscillators 

To illustrate the performance of different aproaches to phase synchronisation 
analysis, we review the paradigmatic example of two chaotic Rossler oscillators 
that are diffusively coupled via their ^-coordinates [25, 45, 50]: 

*1,2 = “^1,22/1,2 ^ -Si,2, (9) 

yi ,2 = ^ 1 , 2X1,2 + ay ly 2 + y{y 2 ,l - 2 / 1 , 2 ), (10) 

£ 1,2 = b+ Zi,2(xi,2 - c). (11) 

The Rossler system is known to show chaotic oscillations on a relatively broad 
frequency band. Moreover, there are two distinct time scales of the dynamics, 
which correspond to the short-term fluctuations of the oscillation amplitude 
(with high values of entropy and fractal dimension) and the long-term phase 
diffusion (with lower values of entropy and fractal dimension) [51]. As it will be 
shown later, this superposition of short- and long-term dynamics can be seen 
as an analogy of the sunspot activity which varies in a roughly similar way. 

Tuning the parameter a of the Rossler system to a certain range of val¬ 
ues, there is a transition from coherent oscillations (i.e., oscillations around a 
well-defined origin in the x-y plane) to the non-coherent Funnel regime. As 
in earlier studies [25], the following parameters have been chosen as an illus¬ 
trative example for the performance of different methods for detecting and 
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quantifying phase coherence: uq = 0.98, w 2 = 1-02, b = 0.1, and c = 8.5. For 
the parameter a, values of 0.16 (standard phase-coherent Rdssler system) and 
0.2925 (non-coherent Funnel regime) have been used. 

In the phase-coherent regime, the standard Hilbert transform approach is 
well suited to define a meaningful phase variable. In order to illustrate the 
performance of the wavelet-based approach, a coupling strength of p = 0.05 
has been applied, which corresponds to a phase synchronised system [45]. In 
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Fig. 1: Left panels: Normalised phase differences (within the interval [— n, rr]) be¬ 
tween the phases of the two Rossler systems (coupled with p = 0.05) estimated with 
a complex Morlet wavelet on different scales for the respective x, y, and 2 compo¬ 
nents (from top to bottom). Right panels: Resulting standard deviations a(S4>) (left) 
and mean resultant lengths r (right) of the phase differences as a function of the 
considered reference period. 
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Fig. 1, the results of a scale-resolved phase synchronisation analysis are shown 
for all three components of both systems. One may see that there is a broad 
continuum of frequencies on which phase synchronisation may be detected. In 
the case of the ^-components, this range is split into two frequency bands of 
coherent motion in the phase variables. 

Considering the transition from non-synchronised to phase synchronised 
and completely synchronised conditions (see Fig. 2), one may observe that 
the applied methodology is only capable to identify phase synchronisation. 
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Fig. 2: Left panels'. Normalised phase differences between the phases of the x vari¬ 
ables of the two coupled Rossler systems (estimated with a complex Morlet wavelet 
on different scales) for coupling strengths of p = 0.02 (no synchronisation), p = 0.05 
(phase synchronisation), and p = 0.2 (complete synchronisation) (from top to bot¬ 
tom). Right panels : Resulting standard deviations u(5(j>) (left) and mean resultant 
lengths r (right) as a function of the considered reference period. 
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Indeed, this is also true for other methods of phase synchronisation analysis. 
For example, Romano [50] reported that the recurrence plot based CPR index 
shows values of 0.115 for p = 0.02 (non-synchronised systems) and 0.998 for 
p = 0.05 (phase synchronisation). The transition between low and high values 
of this index was found to be relatively sharp for the considered parameters. 
Looking in some more detail at the involved frequencies as resolved by the 
wavelet based method, it turns out that whereas the phase coherence of the 
two chaotic attractors is restricted to a certain frequency band, a correspond¬ 
ing coherence is found for a much broader range of reference periods in the 
case of complete synchronisation. 

In contrast to the regime of coherent chaotic oscillations, in the Funnel 
regime, a much wider range of periodicities is present in the wavelet spectro¬ 
grams. Applying the wavelet based approach to phase synchronisation, it turns 
out that the frequency-dependence of the phase synchronisation index r can 
hardly be distinguished from the signature of complete synchronisation in the 
case of coherent oscillations (see Fig. 3). This result shows that the width of a 



Fig. 3: Left panels: Normalised phase differences between the phases of the x vari¬ 
ables of the two coupled Rossler systems in the Funnel regime (estimated with a 
complex Morlet wavelet on different scales) for coupling strengths of p = 0.05 (no 
synchronisation, top) and p = 0.2 (phase synchronisation, bottom). Right panels: 
Resulting standard deviations cr(5<j>) (left) and mean resultant lengths r (right) as a 
function of the considered reference period. 
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possible coherent frequency range can hardly be considered as an indicator for 
the transition from phase to complete synchronisation of chaotic oscillators. 


3 Decadal-Scale Solar Activity 

3.1 Description of the Data 

The decadal-scale variability of solar activity can be observed in a variety 
of different observables, including sunspot numbers and areas, the 10.7-cm 
radio flux, and the total solar irradiation arriving at the Earth. This study 
will exclusively focus on sunspot observations. As different measures, the total 
sunspot areas A (in units of millionths of a hemisphere), their values A n and 
A s for the northern and southern hemisphere of the Sun, and the group and 
international (Wolf, Zurich) sunspot numbers are considered. These quanti¬ 
ties have the advantage that observational data are available which cover a 
sufficiently large amount of time (i.e., several solar activity cycles). 

Whereas the sunspot areas are absolute quantities, sunspot numbers are 
relative index values. If g and n are the numbers of identified sunspot groups 
and individual spots, respectively, the international sunspot number Rj (or 
Rz ) is defined as 

R I = k{10g + n), (12) 

where k is a correction factor for the considered observer. In a similar way, but 
with a slightly inconsistent normalisation, the american sunspot numbers Ra 
can be considered [52]. As a more objective measure, Hoyt and Schatten [53] 
introduced the group sunspot number Rg using data from different observers 
i = l,...,N as 

1 N 

R g = 12.08 - hgi- (13) 

i-1 

The above definition allowed to extend the time series of relative sunspot num¬ 
bers back until 1610. For this purpose, in addition to the observations made at 
the Zurich observatory, other references have been taken into account. By the 
corresponding extension of the sunspot time series, studies based on direct ob¬ 
servations of the solar activity during the Maunder minimum became possible 
[54, 55, 56]. In addition to the numbers given for the entire Sun, hemispher- 
ically resolved values of the international sunspot numbers (hereafter called 
i?" and Rf, respectively) are available since 1992. An extended catalog for 
the time interval 1945-2004 has been provided by Temmer et al. [57, 58], 
combining observations from two Austrian and Slovakian observatories that 
have been normalised to be consistent with the values of Rj. For convenience, 
in this contribution, the relative sunspot numbers from this catalogue will be 
referred to as R n and R s , respectively. 

The main statistical characteristics of the different relative sunspot num¬ 
bers, sunspot areas, and the 10.7-cm radio flux have been reported to be 


366 


R. Donner 


Table 1: Time coverage of the sunspot data used in this study. 


Quantity 

Coverage (daily) 

Coverage (monthly) 

Source 

Ri 

1849(1818)-present 

1749-present 

[A] 

R /’ s 

1992-2006 

1992-2006 

[A] 

Ra 

1944-present 

1944-present 

[A] 

Rg 

1796(1610)-1995 

1610-1995 

[A] 

R n,s 

1945-2004 

1945-2004 

[B] 

A 

— 

1874-present 

[C] 

A n,s 

— 

1874-present 

[C] 


[A] http://www.ngdc.noaa.gov/stp/SOLAR/ftpsunspotnumber.html 

[B] http://cdsweb.u-strasbg.fr/cgi-bin/qcat7J/A-l-A/447/735 

[C] http://solarscience.msfc.nasa.gov/greenwch.shtml 


consistent with each other [4, 59]. In the following, recent results on the lin¬ 
ear as well as non-linear statistical properties of sunspot time series will be 
reviewed and further extended. In particular, it will be examined on which 
scales and up to which degree the aforementioned observables are actually 
phase-coherent. 

3.2 Signatures of Low-Dimensional Chaos 

In his seminal work on auto-regressive spectral estimation, Yule [60] de¬ 
scribed the time series of relative sunspot numbers as a disturbed harmonic 
process, considering the perturbation being an auto-regressive process of sec¬ 
ond order. In later work, the presence of exclusively linear-stochastic be¬ 
haviour in the sunspot record has been excluded by surrogate data testing 
[61]. Figure 4 shows the significance of the time irreversibility ( Q ) statis¬ 
tics for the monthly record of the international sunspot number. Here, a 
significance value of S means that the normalised cubed difference Q{t) = 
((X t + T — X t ) 3 ) / ((A t+r — X t ) 2 ) of the original time series lies outside of the 
S'-fold standard deviation of the corresponding values obtained from a set of 
CAAFT surrogates [62]. The presented results for the time series of annual 
averages are in excellent agreement with the findings of Theiler et al. [61]. If 
however the correlation dimension is used as discriminating statistics, Theiler 
et al. found that the sunspot data are indiscernible from the AAFT surro¬ 
gates. This underlines the importance of a proper choice of the statistics in 
testing against linear-stochastic dynamics. 

It has to be pointed out that the significance of the test using the Q 
statistics shows almost no dependence on the resolution of the time series, in 
particular, moving average filtering has no significant effect unless variations 
are smoothed on a scale of several years. This suggests that the observed sig¬ 
nature of nonlinearity is not an effect of short-term fluctuations, but rather of 







Phase Coherence Analysis 367 




Fig. 4: Significance S(r) = \(Q orz9 (r) — Hq(t))\/(tq(t) (where Q ort9 (r) is the nor¬ 
malised cubed difference of the original time series, and fi q(t ) and ctq(t) are the 
mean and standard deviation of the same statistics computed over N surr = 1000 
realisations of the CAAFT surrogate algorithm (left panel) and of the Barnes model 
after transforming its data to a Gaussian distribution (right panel)) of the test 
against linear-stochastic behaviour in the monthly international sunspot numbers. 
Dashed lines very close to the main solid curve indicate error estimates of the signif¬ 
icance given by AS(t) = ^/(l + S 2 (t) /2)/ N surr [61]. Note that the Barnes model 
relates to the annual mean sunspot numbers, which leads to a remarkably lower 
temporal resolution in the plot. 


the decadal-scale variability dominating the record. This hypothesis is further 
underlined by the fact that the quantity ( Q orl9 — /tq)/ (T Q (whose absolute 
value is just the significance index considered here) shows (as a function of 
the time shift r) a periodicity of about 11 years, which corresponds to the 
average period of the solar Schwabe cycle. 

It has to be underlined that the violation of the hypothesis of a station¬ 
ary linear-stochastic process may be explained by nonstationarities, nonlinear 
stochastic models, or deterministic chaos. A more detailed discrimination be¬ 
tween these alternatives requires refined statistical techniques and is beyond 
the scope of this work. Considering the hypothesis of a nonlinear stochastic 
process, Barnes et al. [63] modelled the variability of annual mean sunspot 
numbers by a narrowband Gaussian stochastic process, which was defined by 
a nonlinear mapping Y n = + a(Z%_ 1 — Z%_ 2 ) 2 where Z n was assumed to 

be an ARMA[2,2] process. Realisations of this model show variability patterns 
that resemble those of the sunspot activity with quasi-regular cycles and su¬ 
perimposed long-term amplitude variations. However, a more detailed analysis 
reveals that this model is very likely not capable of reproducing the sunspot 
activity time series of the last 300 years, which can be shown by either con¬ 
sidering the corresponding amplitude-frequency relationship [64] or the time 
irreversibility measure Q as a discriminating statistics for both original (an¬ 
nual) sunspot numbers and realisations of the Barnes model. In particular, 
the qualitative behaviour of the corresponding significance levels resembles 
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strongly that obtained with CAAFT surrogates (cf. Fig. 4). As a potential 
improvement, Tsai and Chan [65] found that a fourth-order threshold auto¬ 
regressive (NLCAR[4]) model may explain the yearly sunspot record much 
better than the Barnes model. 

The above mentioned approaches to modelling the dynamics of the sunspot 
activity have been based on the assumption of a process that can be exclusively 
described by stochastic fluctuations. With the development of the theory of 
nonlinear dynamical systems, increasing evidence was found that at least the 
decadal-scale solar activity cycle is however better described by deterministic 
chaotic processes. Various authors considered measures like fractal or correla¬ 
tion dimension, Lyapunov exponents, or entropies of the sunspot time series to 
characterise the complexity and predictability of the underlying attractor, e.g. 
[66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83], Palus and 
co-workers provided statistical evidence that the corresponding process can 
be understood as a driven nonlinear oscillator, which is weakly synchronised 
with the solar internal motion [84]. 

Watari [85] showed that a suitable decomposition into components which 
vary on different time scales allows a separation of periodic, chaotic, and ran¬ 
dom components. His results are in qualitative agreement with the findings of 
Qin [86], who reported that low-dimensional chaos can be found on time scales 
of 8 years or longer, whereas the behaviour on shorter time scales has to be 
described by high-dimensional chaos or stochastic processes. In order to fur¬ 
ther validate these findings, some additional analyses have been carried out. 
For this purpose, the full record of monthly international sunspot numbers 
has been subjected to a wavelet decomposition in order to extract compo¬ 
nents that vary on periods of 1-240 months. As a next step, the recurrence 
matrices of the resulting time series have been computed and used for esti¬ 
mating different nonlinear quantitative measures, which are based on statis¬ 
tics of the lengths of continuous diagonal as well as vertical structures in the 
recurrence plots. 

In Fig. 5, some of the results are shown. All measures indicate that there 
is a broad range of reference periods between about 5 and 15 years, on which 
the recurrence plots show strong dynamic regularities suggesting mainly de¬ 
terministic processes. For shorter as well as longer time scales, the degree of 
determinism is significantly smaller, which supports the results of Qin [86]. 
Following [82] where it was pointed out that the z-component of a phase- 
coherent Rossler oscillator resembles the dynamics of the sunspot activity, the 
same computations have also been carried for the system studied in Sect. 2.3. 
Whereas the existence of a broad frequency band with a high degree of deter¬ 
minism is indeed similar for both systems, the Rossler attractor also shows a 
very high degree of determinism on short time scales, which differs from the 
behaviour of the sunspot number time series. Hence, one may conclude that 
the short-term behaviour (one month to some years) of the observed solar 
activity is actually dominated by a dynamics that cannot be described by a 
low-dimensional chaotic attractor. 
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Fig. 5: Upper panels Left-. Degree of determinism (solid) and laminarity ( dashed ) 
of the wavelet filtered monthly international sunspot numbers Ri as a function of 
the considered reference period. Right: Corresponding values for the average lengths 
of diagonal (solid) and vertical (dashed) structures in the recurrence plots. Lower 
panels: The same quantities computed for the ^-component of the first Rossler system 
from Sect. 2.3 in the phase synchronised coherent state (p = 0.05). High values 
indicate a more deterministic dynamics of the system. 


3.3 Phase Coherence of Different Sunspot Observables 

In order to check the consistency between the daily values of the international 
sunspot numbers and the more extended Austrian-Slovakian composite cata¬ 
logue, Temmer et al. [58] studied the corresponding scatter plots. Although 
a remaining scatter can be observed in the daily data, a linear correlation 
coefficient of 0.99 indicates that both quantities actually coincide very well. 
However, comparing the scatter between the monthly sunspot areas and the 
respective sunspot numbers on both hemispheres, it turns out that the scatter 
is much larger (see Fig. 6). In particular, the linear correlation coefficient be¬ 
tween sunspot areas and numbers is “only” 0.968 (0.860 and 0.824 for north¬ 
ern and southern hemisphere, respectively) in the case of the international 
sunspot numbers (with only 15 years of joint coverage) and 0.974 (0.958 and 
0.956 for both hemispheres) in the case of the extended catalogue (60 years). 
Although these coincidences appear to be quite reliable, they leave some space 
for possible inconsistencies. 
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Fig. 6: From left to right Comparison of the total, northern, and southern hemi¬ 
spheric sunspot areas (x-axis) and sunspot numbers (j/-axis) for the extended cat¬ 
alogue of [58] (top, 1945-2004) and the international sunspot number ( bottom , 
1992-2006). 


In order to examine whether there are differences in the joint dynamics, 
the behaviour of the corresponding phases has been compared for different 
reference frequencies, using the wavelet decomposition as described in Sect. 2. 
The results presented in Fig. 7 clearly demonstrate that the dynamics of the 
different observables at time scales within a range of about 8-14 years can 
be considered phase-coherent. In this interval, the phases of sunspot numbers 
and areas do very well coincide with each other (see Fig. 8), with a maximum 
phase shift of about 3 months over the last 60 years (which means less than 
3% of the average duration of a sunspot cycle). A more detailed inspection 
reveals that within this time interval, there has been a gradual change from 
conditions where the phase variations of the sunspot areas occur earlier than 
those of the sunspot numbers towards the opposite conditions. Looking on 
longer time scales using the continuous record of the international sunspot 
number, it turns out that this successive trend indeed sets in around 1950. 
Considering the history of the last about 130 years, one finds that before 1890 
and since about 1970, the decadal-scale variations of the sunspot numbers 
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Fig. 7: Frequency-dependent phase coherence between the monthly sunspot numbers 
from the Austrian-Slovakian composite catalogue and the sunspot areas for the entire 
Sun (solid lines) and the values for the northern ( dashed ) and southern ( dotted ) 
hemispheres. 


occur earlier than those of the sunspot areas (however, these numbers depend 
slighty on the considered time scales). 

In contrast to the longer periods, on shorter time scales, there are remark¬ 
able deviations in the phases of sunspot numbers and areas (see Fig. 7). In 
particular, one has to conclude that the relationship between both types of 
observables cannot be exclusively described by a monotonous (possibly non¬ 
linear) transformation, but involves irregular short-term contributions. The 
presented analysis does not allow to reveal whether the relative scatter be¬ 
tween sunspot numbers and areas due to fluctuations on shorter time scales 
is mainly an effect of “observational noise” (which would be smoothed out 
when going to longer time scales) or of dynamically relevant deterministic or 
stochastic processes that act on short scales in both time and space. 


4 The North-South Asymmetry of Solar Activity 


Since observational records with a sufficient time coverage have become avail¬ 
able, there has been an increasing interest in the spatial structure of sunspot 
activity. Among the first to study the corresponding hemispheric asymmetry, 
Newton and Milsom [87] introduced a very simple asymmetry index, 


NA = 


N-S 
N+S ’ 


(14) 


where N and S are the values of the respective observables (i.e., sunspot areas 
or numbers) on the northern and southern hemisphere of the Sun. They found 
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Fig. 8: Estimated phase shift between monthly sunspot areas and sunspot num¬ 
bers from the Austrian-Slovakian composite catalogue (left panel, 1945-2004) and 
the international sunspot numbers (right panel, 1875-2006) for different reference 
frequencies. The upper panels represent the results for all reference time scales be¬ 
tween 8 and 14 years, whereas in the lower panels, only the phase shift on a scale of 
T = 10.75 years is displayed. All plotted curves have been subjected to a one-year 
moving average filter for smoothing. Dashed lines correspond to the cone of influence 
(with a width of \/2T on a scale T for a Morlet wavelet [110]), outside which the 
results of wavelet analysis are biased due to boundary effects. 


that the asymmetry “changes from cycle to cycle, which although not random, 
appear to have no definite period”. In addition to their results, Waldmeier 
[88] provided evidence that the phase shift between the activity on both 
hemispheres is another important parameter, which may have a certain in¬ 
fluence on the values of NA. The interrelationship between the variations of 
the north-south asymmetry and major solar flares has been addressed by a 
number of authors [89, 90, 91, 92]. More detailed statistical studies on the 
long-term variability were subsequently published and subjected to intensive 
discussions [93, 94, 95, 96, 97, 98, 99, 100, 101, 102]. Whereas some authors 
focussed on the question whether there are significant periodicities inside the 
asymmetry index [103, 104, 105, 106, 107, 108], increasing efforts have been 
made to study whether the variations of the asymmetry can be understood 










Phase Coherence Analysis 373 


as a chaotic process [105, 109]. Ballester et al. concluded that the results of 
the corresponding studies may be statistically more reliable, if the absolute 
asymmetry 

AA = N — S (15) 

is considered instead of its normalised variant NA. 

Most of the aforementioned studies have considered the sunspot areas, 
whereas less results have been obtained based on sunspot numbers due to 
the worse time coverage of the corresponding hemispherically resolved records 
[58, 93, 100, 109]. In this section, the meaning of phase shifts between northern 
and southern hemispheric activity (i.e., the corresponding sunspot areas or 
numbers) for the north-south asymmetry will be further examined, following 
the corresponding ideas originally pointed out by Waldmeier. The approaches 
that are used in the following potentially yield new measures of the north- 
south asymmetry, which do not directly compare to NA or AA in that they 
are based on a fundamentally different assumption, namely the importance of 
phase differences between the oscillations on both hemispheres. A thorough 
comparison of the variability of both traditional and phase-based quantities 
will be subject to future studies. 

4.1 Scale-Resolved Phase Coherence Analysis 

In a recent paper [111], the wavelet-based approach to phase coherence analy¬ 
sis has been applied to the full record of monthly values of hemispheric sunspot 
areas between 1874 and 2006. It has been shown that in a range of about 8-14 
years, the variations of sunspot areas on both solar hemispheres allow the def¬ 
inition of proper phase variables. Figure 9 underlines that this result holds for 
both sunspot areas and sunspot numbers. The observed interval of reference 
periods on which phase coherence is found matches well the results of the 
previous section on the phase coherence between different observables as well 
as those on the presence of low-dimensional chaos [86]. 

Within the coherent range, the phase variables and their corresponding 
differences vary on rather long time-scales. In particular, it has been shown 
that before about 1925 and since about 1965, the sunspot activity in the 
northern hemisphere occurred earlier than that in the southern hemisphere, 
whereas in the meantime, the opposite conditions were present. 

In Fig. 10, the phase difference time series for the hemispheric sunspot 
areas and numbers are shown. In particular, the behaviour on a reference 
time scale of T = 10.75 years is shown, for which the drift of the phase vari¬ 
ables is minimised (average frequency) [111]. The corresponding results for 
the sunspot areas have already been presented in [111], however, the remain¬ 
ing short-term fluctuations used here are significantly smaller. The reason for 
this is that in [111], the phases have been obtained via a Hilbert transform of 
the real part of the wavelet filtered series (which may involve some numeri¬ 
cal errors), whereas in this contribution, both real and imaginary parts from 
the wavelet decomposition have been directly used for computing the phases. 
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Fig. 9: Frequency-dependent phase coherence between the sunspot areas A n ’ a (left 
panel) and between the sunspot numbers from the composite catalogue R n,a (right 
panel) on both solar hemispheres, quantified by the mean resultant length (4). 
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Fig. 10: Left: Phase differences between the sunspot areas A n ’ a (upper panel) and 
sunspot numbers R n,a (lower panel) for time scales between 8 and 14 years estimated 
with a complex Morlet wavelet. Right: Respective phase differences on a scale of 
T = 10.75 years. Note the different scalings on the time (*) axis. Dashed lines 
correspond to the cone of influence. 
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However, the underlying pattern is fully equivalent. In order to exclude possi¬ 
bly erroneous results at the boundaries of the record, the corresponding cones 
of influence [110] have been computed, which give a rough estimate of the 
part of the wavelet-filtered time series that is statistically reliable. 

Examining the inferred phase shift variability in some more detail, the 
same qualitative behaviour is found for sunspot numbers and areas. As a 
particularly relevant feature, there is a steep increase of the phase difference 
between about 1960 and 1975, where conditions with the activity occuring 
significantly earlier on the southern hemisphere than on the northern one are 
replaced by the opposite behaviour within only about one solar cycle. 

4.2 Topological Phase Coherence Analysis 

The wavelet-based method discussed above requires the explicit selection of a 
distinct reference frequency. As a potential alternative, Zolotova and Ponyavin 
[112, 113] suggested to use the line of synchronisation (LOS) [44] in the cross¬ 
recurrence plot as an indicator of the phase shift of sunspot activity on both 
hemispheres. Indeed, the presence of such a continuous structure can be inter¬ 
preted as a signature of time-scale synchronisation, i.e., a coherent dynamics 
of the two considered time series if the relative time scales are adjusted in 
a corresponding way. Some successful applications of this method have been 
demonstrated with respect to palaeoclimatic time series [44, 114, 115], where 
the corresponding problem of adjusting age-depth models to different sedi¬ 
mentary or ice core records is very typical [116, 117, 118]. 

In [111], it has already been argued that in the case of the sunspot activity 
records, the LOS approach might however have been misleading. One major 
point of criticism is that the algorithm for finding an “optimal path” through 
a cross-recurrence plot is not very robust. This is also one particular reason 
why the LOS technique has not yet become standard in palaeoclimatology, 
where still traditional methods of “sequence slotting” [116, 117] are used. 
In particular, using unthresholded recurrence plots (distance plots) Ci?“- = 

11 Xi — Yj\ | instead of cross-recurrence plots (i.e., a matrix of pairwise distances 
between all observations in both considered time series) would be a much more 
natural approach, for which powerful dynamic programming algorithms are 
available [119]. Moreover, the choice of threshold values e and the specific 
norm may have an influence on the estimated LOS. 1 

Besides the rather weak robustness against perturbations of the data, there 
is the conceptual problem that unlike claimed by Zolotova and Ponyavin [112], 
the LOS pattern is not necessarily a representative of the phase shift, as the 


1 It has to be mentioned that there is another drawback of the path finding algo¬ 
rithm for estimating the LOS [44, 42] which has been implemented in the Matlab 
CRP toolbox used in this study as well as in the work of Zolotova and Ponyavin 
[112, 113]: There is a clear preference towards horizontal steps compared to ver¬ 
tical ones [120] which may lead to systematic trends in the estimated LOS. 
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definition of cross-recurrence plots implies a dependence on both phase and 
amplitude of the considered time series. However, there are some possible 
modifications of the cross recurrence plot method that may contribute to a 
solution of this problem: 

1. Relative recurrences. Instead of using the standard definition of cross¬ 
recurrence plots, one may consider normalised distances, for example, by 
setting 



(16) 


Here, Xi and Yj correspond to the values of the particularly considered 
observable (here, sunspot numbers or areas) in the northern and southern 
solar hemisphere at times t, and tj, respectively. The resulting matrix CR 
might be referred to as relative cross-recurrence matrix , and its graphical 
representation to as a relative cross-recurrence plot. The term relative 
recurrence is motivated by the fact that the ratio in the argument of the 
Heaviside function may only have values between 0 and 1, which yields a 
normalised range for the possible threshold values e. As an unthresholded 
version, one may also define relative distance matrices in a similar way as 



(17) 


2. Dynamic encoding and symbolic recurrences. A second possibility 
to make the results of the LOS method becoming more robust against 
small fluctuations of the time series values is applying a coarse-graining 
(equivalent to a symbolic encoding) to the record before computing the 
recurrence matrix. This coarse-graining may be static (i.e., based on the 
order relationships of the observations with respect to a set of predefined 
threshold values), dynamic (i.e., either based on the order relationships of 
subsequent subsets of observations or a static encoding of the difference 
filtered time series), or a mixed form. Generalising the recently proposed 
concepts of order patterns recurrence plots [121, 122, 123] and ordinal 
recurrence plots [42,122], these approaches may be understood as symbolic 
recurrence plots. In order to make the results as sensitive to phase signals 
as possible, the use of a dynamic encoding appears to be most promising. 
In particular, the concept of order patterns recurrence plots might be 
especially suited for this purpose. 

It will be a subject of further studies whether the two mentioned ap¬ 
proaches are indeed able to derive meaningful phase shifts between oscillatory 
signals from time series. In order to test for this, more robust and reliable 
algorithms for the estimation of the LOS are required, which are currently 
not yet available. 
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4.3 Significance Test 

In order to prove the relevance of the derived phase difference time series, the 
concept of “natural surrogates” [124] has been proposed as a potential basis 
[111]. Here, a signal is used as a surrogate of the considered time series that 
represents the same or some essentially similar dynamical system. In the case 
of the sunspot areas, the consideration of records of the international sunspot 
number from earlier time intervals has been suggested. The results of Sect. 3.3 
may be considered as a validation of this approach, as the phase shift between 
the total sunspot areas and numbers on both hemispheres is much smaller 
than that between the records on northern and southern hemisphere. The fact 
that the resulting phase difference time series of the original data lies outside 
the confidence levels obtained from the natural surrogates [111] demonstrates 
that the joint phase diffusion of the time series from both hemispheres deviates 
remarkably from the corresponding phase shift with respect to a structurally 
equivalent independent signal. In order to test whether phase coherence (in 
the sense of synchronisation) is actually present in the hemispheric sunspot 
areas and the corresponding sunspot numbers, a sophisticated statistical test 
would have to be applied to one of the resulting phase coherence measures. A 
corresponding bootstrap approach to quantifying the significance of the mean 
resultant length (4) has been recently proposed by Allefeld and Kurths [39]. 

According to the above considerations, the derived phase shift pattern is 
actually significant and may be considered as a proxy for the north-south 
asymmetry of solar activity. The fact that the phase difference time series 
deviates from the series computed using natural surrogates from earlier time 
intervals is most likely an effect of a particular type of nonstationarity of the 
system, which is related to a gradual change of the average frequency of oscil¬ 
lations. On the one hand, results have been reported that the solar activity of 
the last decades has been unusually strong [125, 126]. However, these findings 
have mainly been related to the magnitude of solar activity. On the other 
hand, Duhau [127, 128] interpreted pronounced increases of the solar activity 
between 1923 and 1949 and after 1993 as signatures of phase catastrophes as¬ 
sociated with a breakdown of centennial scale oscillations (Gleissberg cycle) 
of the solar activity. It has been pointed out that similar dynamics can be 
regarded as a feature of many chaotic oscillators. The question whether the 
presence of such a speculative phase catastrophe can be considered being a 
reason for the results of the significance test has to be further investigated in 
future studies, possibly considering other types of surrogates. 


5 Discussion 

This chapter has summarised some recently developed approaches for testing 
the phase coherence of different oscillatory modes in the dynamics of complex 
systems. In particular, it has been argued that for inferring phase difference 
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patterns, at the present state of research, the use of methods based on an 
explicit phase definition (e.g., via wavelet or Hilbert transforms) is superior to 
purely topological methods like the LOS method. However, the latter class of 
approaches offers a high potential for future developments, which is underlined 
by the successful introduction of synchronisation indices based on recurrence 
plots (see Sect. 2.2). 

As a particular application, the classical problem of decadal-scale oscil¬ 
lations of solar activity has been considered. It has been demonstrated that 
phase coherence between both solar hemispheres is present on time scales be¬ 
tween about 8 and 14 years, whereas on shorter scales, irregular components 
contribute differently to the respective dynamics. With respect to the iden¬ 
tification problem discussed in Sect. 2, it has to be underlined that further 
development of empirically motivated physical models is necessary in order 
to attribute this coherent behaviour to phase synchronisation of two distinct 
oscillatory components or two observations of the same oscillatory system. 

The long-term variability of the phase difference between the activity on 
both solar hemispheres has been derived for the last about 130 years. It has 
been shown that the inferred pattern does not significantly depend on the 
choice of a reference frequency or of the sunspot areas or numbers as the con¬ 
sidered observables. Very likely, the inferred phase shift is a major contributor 
to the north-south asymmetry of solar activity, which supports suggestions 
going back to Waldmeier 50 years ago [88]. Referring to the most recent lit¬ 
erature, one has to mention that the presented results are not in accordance 
with the findings of Zolotova and Ponyavin [112, 113] who reported a much 
more irregular phase variability pattern with much larger phase differences. 
Following the arguments given in Sect. 4.2, the results of the wavelet-based 
approach used in this study are more reliable that those of the mentioned ref¬ 
erences, where the very unstable LOS method has been used. In addition, one 
may criticise that Zolotova and Ponyavin attributed the presence of a corre¬ 
sponding different phase dynamics on both solar hemispheres to the presence 
of “phase asynchronisation” [112, 113, 129]: Apart from the fact that the use 
of synchronisation terms is rather doubtful in the considered problem, the 
concept of synchronisation refers to a process rather than a state [21], i.e., the 
term “asynchronisation” is physically meaningless. 

In this contribution, wavelet decomposition has been used to derive coher¬ 
ent oscillatory signals for which a phase variable can be defined. A potential 
alternative would be the consideration of empirical mode decomposition [113] 
or similar methods, that go beyond the requirement of a fixed reference fre¬ 
quency of oscillations. However, it has to be noted that in such case, the par¬ 
ticular physical meaning of the inferred modes has to be carefully examined 
before using them for further analyses. 

Finally, one has to mention that the north-south asymmetry of solar 
activity has traditionally been attributed to different amplitudes of the corre¬ 
sponding observables. In this work as well as a number of subsequent contri¬ 
butions [111, 112, 113, 129], first attempts have been made to use temporally 
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varying phase differences between decadal-scale oscillations as an alternative 
way of quantitatively describing this asymmetry. Using the information on 
phase shifts, one may adjust the corresponding time series to equal phases 
for considering the effect of different amplitudes separately from the phase 
dynamics. A corresponding data-adaptive redefinition of asymmetry indices 
and their thorough analysis is outlined to future studies. 
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Index 


aa-index, 341 
Analytic signal, 358 
Annual cycle, 127 
Antarctica, 246 
AR(1) process, 5, 335 
Arctic sea ice, 67 
ASIAEX, 224 

Barnes model, 367 
Bayes’ theorem, 37 
BERNESE (GPS software), 250 
Block size, 8-9, 11 
Blowout bifurcation, 83 

Calibration, 10, 13 

Canonical correlation analysis (CCA), 
97, 100 

Cellular automata (CA), 273, 274, 

275, 276 

Chaos synchronization, 82 
Climate model, 19-20 
Climate system, 17 
Cluster analysis, agglomerative, 

143, 144 
Coherence 

phase, 355, 357, 369 
spectral, 126-127 
wavelet, 358 
Complexity, 125, 130 
Conditional probability, 38 
Confidence interval (Cl) 
conventional, 4-5, 6, 7 
subsampling, 8, 10 


Correlation 

Bravais-Pearson, 143 
ensemble, 130, 145 
rank-order, 137, 145 
spatio-temporal, 125 
Correlation distance, 143 
Correlation matrix, 131 
Correlation network, 145, 149 
Coverage probability 
actual, 6, 7, 9 
target, 6 

Cross-recurrence plot, 360, 375 
relative, 376 

Crustal deformation, 245 
Curvature method, 358 
Cycle, 7.8 year, 349, 350 

Data assimilation, 81, 82, 83 
Deception Island volcano, 245 
Decision variable, 43 
Deseasonalisation, 125, 127 
Dimension 

fractal, 130, 132 
multivariate, 125, 130 
Dimension density 

information variance decay, 138 
KLD, 132 
LVD, 133 

nonparametric LVD, 138 
Dynamo 

Faraday disk, 293 
HSA, 294 

Dynamo model, 293 
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Eigenspectrum, 330 
El Nino Southern Oscillation 
(ENSO), 350 

Empirical mode decomposition (EMD), 
128, 359 

Empirical orthogonal functions (EOF), 
130-131 
Entropy, 332 
joint, 333 

Kolmogorov-Sinai, 332 
Entropy rate, coarse-grained 
(CER), 332 
Extrapolation, 116 
Extreme event, 35, 39 

Filter 

bandpass, 359 
moving average, 127, 128 
spectral, 128 

Fourier analysis, nonlinear, 312 
Fourier transform 
discrete, 224, 227 
windowed, 237 

Frequency, instantaneous, 358 

Gabor transform, 359 
Generalised auto-correlation function, 
360 

Geomagnetism, 340-341 

Global Positioning System (GPS), 245 

Gravity waves, internal, 223 

Hilbert transform, 358 
Humidity, 91 

Ice cores, 72 
Ice thickness, 68 
Inconsistency index, 102 
Independent component analysis (ICA), 
131, 359 

Information criterion, 102 
Information theory, 273 
Integral reflection coefficient, 

316, 318 

Ising model, 274, 275, 276 
Isometric feature mapping 
(ISOMAP), 131 

Japan,142 


Karhunen-Loeve decomposition (KLD), 
130-131 

Korteweg-de Vries equation, 243 
KPSS test, 160, 161 

Line of identity (LOI), 360 

Line of synchronisation (LOS), 360, 375 

Locally linear embedding (LLE), 131 

Local models, 22 

Long range dependence, 160, 165 

Lorenz equations, 82, 87, 297 

Luzon Strait, 224 

Lyapunov exponent, 132 

Lyapunov function, 87, 93 

Marginal equiquantization, 333 
Mean resultant length, 359 
Mesoscale forecasting, 81 
Mesoscale model, 81, 86 
MHD waves, nonlinear, 311 
Modes, oscillatory, 346, 347, 348 
Monte Carlo simulation, 4, 6 
Multi-dimensional scaling (MDS), 131 
Multifractal process, 338 
Multi-layer perceptron, 98 
Multiresolution analysis, 62, 71 
Mutual information, 129, 145, 277, 332, 
333, 359 

Network, 125, 126 
Neural network, 97 
Noise, red, 330 

Nonlinear canonical correlation analysis 
(NLCCA), 99, 120 
Nonlinearity, 17, 21, 23, 29 
Nonlinear principal component analysis 
(NLPCA), 99, 119, 131 
North Atlantic, 159, 163 
North Atlantic Oscillation (NAO), 341 
North-south asymmetry, 355, 371 
Nudging, 83 
Null hypothesis, 331 
Numerical weather prediction 
(NWP), 81 

Ocean model, Lamb, 223, 226 
Overfitting, 100 

Parameter adaptation, 86, 89 
Parameter estimation, 81, 87, 90, 93 
Phase, 357 
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Phase averaging, 128 
Phase catastrophe, 377 
Phase diffusion, 355, 359, 361 
Poincare, section, 358 
Politi-Witt model, 140 
PP test, 160, 161 
Precursor, 40 
probabilistic, 39 

Principal component analysis (PCA), 
97, 130-131 

Probabilistic Activation Map, 273, 
277, 283 

Quasi-biennial oscillation (QBO), 

343, 350 

Quasigeostrophic channel model, 91 
Quasigeostrophic equation, 85 

Random process, multivariate 
Gaussian, 138 

Random walk process, 159, 165 
Reanalysis, 18, 19 

Receiver operating characteristic, 43 
Recurrence matrix, 360 
Recurrence plot, 360 
symbolic, 376 
unthresholded, 375 
Recurrence rate, 360 
Regression 
linear, 21 

generalised least squares, 163 
ordinary least squares, 162 
Regularity index, 333 
Regularization, 99 
Reliability, 39 
RMS error, 41 
Rdssler equations, 83, 361 

Scale, 61, 62 

Scaling, 161 

Scaling coefficients, 64 

Scattering transform, 224, 228, 312 

Schrodinger equation 

Derivative Nonlinear, 312, 313 
Nonlinear, 312 
Score 
Brier, 42 
Ignorance, 42 
skill, 22 


Scoring rule, 42 
Sea-level, 157 
Self-similarity, 130 
Significance, 377 
Significance index, 367 
Singular system analysis (SSA), 

130, 329 

enhanced Monte Carlo, 334 
Monte Carlo (MCSSA), 330, 331 
multi-channel (MSSA), 131 
Singular value decomposition 
(SVD), 131 
Skewness, 5, 10 

Solar cycle, 11 year, 347, 348, 355 
Solitons, 229, 311 
South China Sea, 223 
South Shetland Islands, 245 
Space plasma, 312, 323 
Stationarity, 37, 160 
Strictly proper, 43 
Subsampling, 3, 8 
Sunspot areas, 365 
Sunspot numbers, 340-341, 347, 365 
American, 365 
group, 363 
international, 365 
SURE estimator, 245, 257 
Surrogates, 22, 331, 366 
AAFT, 366 
CAAFT, 366 
natural, 377 
twin, 361 

Synchronisation, 82, 85, 357 
on-off, 83 

phase, 126-127, 355, 357 

Temperature, 125, 142, 341 
Template analysis, 302 
Test, 331 
Tide gauges, 157 
Trend assessment, 158 

Uncertainty, 332 

Unit root process, 160, 161 

Unstable periodic orbit, 301 

Volcanic monitoring, 245 
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Wavelet 
Haar, 62 

least asymmetric, 75 
Morlet, 231 

Wavelet coefficients, 64 
Wavelet decomposition, 
Wavelet denoising, 252, 
Wavelet scalegram, 254 


Wavelet spectrum, 62, 161 
Wavelet transform, 359 

continuous, 61, 76, 148, 230, 253 
discrete, 62, 161, 338 
maximal overlap discrete, 62, 64 
128 orthonormal discrete, 73 

256 Wavelet variance, 62, 65 

Weight penalty, 102 


